So, I found a solution to this problem. On OSX I had been using Julia as provided by homebrew. When I switched to using the julia I built from source myself, everything worked and OS X was just as fast.
I'm not sure why the homebrew version caused this problem, but at least I can have my fast Julia on OS X again! On Thursday, April 30, 2015 at 12:10:37 PM UTC-4, Spencer Lyon wrote: > > Thanks for the tip. > > I rebuilt my docker image to have a 1-day old master and am getting the > same results (see updated gist). So, unfortunately the puzzle isn't > resolved yet... > > > > On Wednesday, April 29, 2015 at 4:50:03 PM UTC-4, Spencer Lyon wrote: >> >> I ran into strange performance issues in an algorithm I have been working >> on. >> >> I have a test case as well as some timing and profiler results at this >> gist: https://gist.github.com/spencerlyon2/d21d6368a2ccbf6f1e7b >> >> >> I summarize the issues here. Consider the following code (note I am >> defining myexp because of this issue: >> https://github.com/JuliaLang/julia/issues/11048. It turns out that on OS >> X, calling apple's libm gives a substantial speed up -- e.g. I'm doing >> everything I can to give OS X a chance to win here) >> >> the code: >> >> @osx? ( >> begin >> myexp(x::Float64) = ccall((:exp, :libm), Float64, (Float64,), x) >> # myexp(x::Float64) = exp(x) >> end >> : begin >> myexp(x::Float64) = exp(x) >> end >> ) >> >> function test_func(data::Matrix, points::Matrix) >> # extract input dimensions >> n, d = size(data) >> n_points = size(points, 1) >> >> # transpose data and points to access columns at a time >> data = data' >> points = points' >> >> # Define constants >> hbar = n^(-1.0/(d+4.0)) >> hbar2 = hbar^2 >> constant = 1.0/(n*hbar^(d) * (2π)^(d/2)) >> >> # allocate space >> density = Array(Float64, n_points) >> Di_min = Array(Float64, n_points) >> >> # apply formula (2) >> for i=1:n_points # loop over all points >> dens_i = 0.0 >> min_di2 = Inf >> for j=1:n_points # loop over all other points >> d_i2_j = 0.0 >> for k=1:d # loop over d >> @inbounds d_i2_j += ((points[k, i] - data[k, j])^2) >> end >> dens_i += myexp(-0.5*d_i2_j/hbar2) >> if i != j && d_i2_j < min_di2 >> min_di2 = d_i2_j >> end >> end >> density[i] = constant * dens_i >> Di_min[i] = sqrt(min_di2) >> end >> >> return density, Di_min >> end >> >> >> >> To test the performance of this code on linux and OS X, I started up a >> docker image with a recent (40 days old master) julia from my OS X machine >> and compared the timing against running it on OS X directly (with 1 days >> old julia). I found that for `data, points = randn(9500, 2)` on linux >> version takes about 2.6 seconds to run `test_func` whereas on OS X it >> takes about 9.3. >> >> I can't explain this large (almost 4x) performance hit that I get from >> running the code on the native OS vs the virtual machine. >> >> More details (profiler results, timing stats, self-contained runnable >> example) in the gist: >> https://gist.github.com/spencerlyon2/d21d6368a2ccbf6f1e7b >> >> >> >>
