Hello, I've been playing around with Julia for some data classifiers commonly used in mapping, such as the Jenks "natural breaks" algorithm<http://en.wikipedia.org/wiki/Jenks_natural_breaks_optimization>. (For background and a Javascript implementation, I highly recommend Tom MacWright's literate programming implementation<http://www.macwright.org/2013/02/18/literate-jenks.html>.) Recently, a friend created a Cython Python version<https://github.com/perrygeo/jenks> of Jenks which is very performant. I've created my Julia version largely based on his version, and based on posts like this one on isotonic regression in Julia <http://tullo.ch/articles/python-vs-julia/>, I had hoped the Julia one would be in a similar ballpark in terms of performance to Cython. Here's my very basic timing results:
Matt Perry's Cython version: In [15]: %timeit jenks(data, 5) 100 loops, best of 3: 13.9 ms per loop My Julia version, running in 0.2.1 (running against master produced slower results): julia> @time jenks(data, 5) elapsed time: 1.046356641 seconds (646397168 bytes allocated) In comparison, an implementation in Python which only uses lists<https://gist.github.com/llimllib/4974446> runs in about 2.8 seconds on my machine. So I imagine that I must be doing something wrong, because I imagine the performance different should not be in favor of the Cython version by a factor of 75, and should handily dispatch the implementation which uses ill-fitting data structures. My Julia code is a rather crude port and is by no means idiomatic, I did do some basic profiling, and most of the runtime seems to come from the basic math performed in each loop, I've only done some minor performance optimization. Any and all advice appreciated on what would make this code more performant for this particular task. Code and data at: https://github.com/scw/jenks.jl Thanks for your help, Shaun
