So, I found a solution to this problem.

On OSX I had been using Julia as provided by homebrew. When I switched to 
using the julia I built from source myself, everything worked and OS X was 
just as fast. 

I'm not sure why the homebrew version caused this problem, but at least I 
can have my fast Julia on OS X again!

On Thursday, April 30, 2015 at 12:10:37 PM UTC-4, Spencer Lyon wrote:
>
> Thanks for the tip.
>
> I rebuilt my docker image to have a 1-day old master and am getting the 
> same results (see updated gist). So, unfortunately the puzzle isn't 
> resolved yet...
>
>
>
> On Wednesday, April 29, 2015 at 4:50:03 PM UTC-4, Spencer Lyon wrote:
>>
>> I ran into strange performance issues in an algorithm I have been working 
>> on. 
>>
>> I have a test case as well as some timing and profiler results at this 
>> gist: https://gist.github.com/spencerlyon2/d21d6368a2ccbf6f1e7b
>>
>>
>> I summarize the issues here. Consider the following code (note I am 
>> defining myexp because of this issue: 
>> https://github.com/JuliaLang/julia/issues/11048. It turns out that on OS 
>> X, calling apple's libm gives a substantial speed up -- e.g. I'm doing 
>> everything I can to give OS X a chance to win here)
>>
>> the code: 
>>
>> @osx? (
>>          begin
>>              myexp(x::Float64) = ccall((:exp, :libm), Float64, (Float64,), x)
>>              # myexp(x::Float64) = exp(x)
>>          end
>>        : begin
>>              myexp(x::Float64) = exp(x)
>>          end
>>        )
>>  
>> function test_func(data::Matrix, points::Matrix)
>>     # extract input dimensions
>>     n, d = size(data)
>>     n_points = size(points, 1)
>>  
>>     # transpose data and points to access columns at a time
>>     data = data'
>>     points = points'
>>  
>>     # Define constants
>>     hbar = n^(-1.0/(d+4.0))
>>     hbar2 = hbar^2
>>     constant = 1.0/(n*hbar^(d) * (2π)^(d/2))
>>  
>>     # allocate space
>>     density = Array(Float64, n_points)
>>     Di_min = Array(Float64, n_points)
>>  
>>     # apply formula (2)
>>     for i=1:n_points  # loop over all points
>>         dens_i = 0.0
>>         min_di2 = Inf
>>         for j=1:n_points  # loop over all other points
>>             d_i2_j = 0.0
>>             for k=1:d  # loop over d
>>                 @inbounds d_i2_j += ((points[k, i] - data[k, j])^2)
>>             end
>>             dens_i += myexp(-0.5*d_i2_j/hbar2)
>>             if i != j && d_i2_j < min_di2
>>                 min_di2 = d_i2_j
>>             end
>>         end
>>         density[i] = constant * dens_i
>>         Di_min[i] = sqrt(min_di2)
>>     end
>>  
>>     return density, Di_min
>> end
>>
>>
>>
>> To test the performance of this code on linux and OS X, I started up a 
>> docker image with a recent (40 days old master) julia from my OS X machine 
>> and compared the timing against running it on OS X directly (with 1 days 
>> old julia). I found that for `data, points = randn(9500, 2)` on linux 
>> version takes about 2.6 seconds to run `test_func` whereas on OS X  it 
>> takes about 9.3. 
>>
>> I can't explain this large (almost 4x) performance hit that I get from 
>> running the code on the native OS vs the virtual machine. 
>>
>> More details (profiler results, timing stats, self-contained runnable 
>> example) in the gist: 
>> https://gist.github.com/spencerlyon2/d21d6368a2ccbf6f1e7b
>>
>>
>>
>>

Reply via email to