Hello,

I've been playing around with Julia for some data classifiers commonly used
in mapping, such as the Jenks "natural breaks"
algorithm<http://en.wikipedia.org/wiki/Jenks_natural_breaks_optimization>.
(For background and a Javascript implementation, I highly recommend Tom
MacWright's literate programming
implementation<http://www.macwright.org/2013/02/18/literate-jenks.html>.)
 Recently, a friend created a Cython Python
version<https://github.com/perrygeo/jenks> of
Jenks which is very performant. I've created my Julia version largely based
on his version, and based on posts like this one on isotonic regression in
Julia <http://tullo.ch/articles/python-vs-julia/>, I had hoped the Julia
one would be in a similar ballpark in terms of performance to Cython.
Here's my very basic timing results:

Matt Perry's Cython version:
  In [15]: %timeit jenks(data, 5)
  100 loops, best of 3: 13.9 ms per loop

My Julia version, running in 0.2.1 (running against master produced slower
results):
julia> @time jenks(data, 5)
elapsed time: 1.046356641 seconds (646397168 bytes allocated)

In comparison, an implementation in Python which only uses
lists<https://gist.github.com/llimllib/4974446> runs
in about 2.8 seconds on my machine. So I imagine that I must be doing
something wrong, because I imagine the performance different should not be
in favor of the Cython version by a factor of 75, and should handily
dispatch the implementation which uses ill-fitting data structures. My
Julia code is a rather crude port and is by no means idiomatic, I did do
some basic profiling, and most of the runtime seems to come from the basic
math performed in each loop, I've only done some minor performance
optimization. Any and all advice appreciated on what would make this code
more performant for this particular task.

Code and data at:
  https://github.com/scw/jenks.jl

Thanks for your help,
Shaun

Reply via email to