Your algorithm looks fine. The problems are entirely in your testing
script. The first issue is that JSON.parse returns a Vector{Any}, which
deoptimizes everything. Try:
f = open("test.json")
data = convert(Vector{Float64}, JSON.parse(f))
The second issue is that you're including compilation time in your test.
Try:
jenks(data, 5)
@time jenks(data, 5)
With these changes, I get:
➜ jenks.jl git:(master) ✗ julia jenks-run.jl
elapsed time: 0.013948242 seconds (199084 bytes allocated)
On Tuesday, March 25, 2014 8:45:43 PM UTC-4, Shaun Walbridge wrote:
>
> Hello,
>
> I've been playing around with Julia for some data classifiers commonly
> used in mapping, such as the Jenks "natural breaks"
> algorithm<http://en.wikipedia.org/wiki/Jenks_natural_breaks_optimization>.
> (For background and a Javascript implementation, I highly recommend Tom
> MacWright's literate programming
> implementation<http://www.macwright.org/2013/02/18/literate-jenks.html>.)
> Recently, a friend created a Cython Python
> version<https://github.com/perrygeo/jenks> of
> Jenks which is very performant. I've created my Julia version largely based
> on his version, and based on posts like this one on isotonic regression
> in Julia <http://tullo.ch/articles/python-vs-julia/>, I had hoped the
> Julia one would be in a similar ballpark in terms of performance to Cython.
> Here's my very basic timing results:
>
> Matt Perry's Cython version:
> In [15]: %timeit jenks(data, 5)
> 100 loops, best of 3: 13.9 ms per loop
>
> My Julia version, running in 0.2.1 (running against master produced slower
> results):
> julia> @time jenks(data, 5)
> elapsed time: 1.046356641 seconds (646397168 bytes allocated)
>
> In comparison, an implementation in Python which only uses
> lists<https://gist.github.com/llimllib/4974446> runs
> in about 2.8 seconds on my machine. So I imagine that I must be doing
> something wrong, because I imagine the performance different should not be
> in favor of the Cython version by a factor of 75, and should handily
> dispatch the implementation which uses ill-fitting data structures. My
> Julia code is a rather crude port and is by no means idiomatic, I did do
> some basic profiling, and most of the runtime seems to come from the basic
> math performed in each loop, I've only done some minor performance
> optimization. Any and all advice appreciated on what would make this code
> more performant for this particular task.
>
> Code and data at:
> https://github.com/scw/jenks.jl
>
> Thanks for your help,
> Shaun
>