Thanks for the tip. I rebuilt my docker image to have a 1-day old master and am getting the same results (see updated gist). So, unfortunately the puzzle isn't resolved yet...
On Wednesday, April 29, 2015 at 4:50:03 PM UTC-4, Spencer Lyon wrote: > > I ran into strange performance issues in an algorithm I have been working > on. > > I have a test case as well as some timing and profiler results at this > gist: https://gist.github.com/spencerlyon2/d21d6368a2ccbf6f1e7b > > > I summarize the issues here. Consider the following code (note I am > defining myexp because of this issue: > https://github.com/JuliaLang/julia/issues/11048. It turns out that on OS > X, calling apple's libm gives a substantial speed up -- e.g. I'm doing > everything I can to give OS X a chance to win here) > > the code: > > @osx? ( > begin > myexp(x::Float64) = ccall((:exp, :libm), Float64, (Float64,), x) > # myexp(x::Float64) = exp(x) > end > : begin > myexp(x::Float64) = exp(x) > end > ) > > function test_func(data::Matrix, points::Matrix) > # extract input dimensions > n, d = size(data) > n_points = size(points, 1) > > # transpose data and points to access columns at a time > data = data' > points = points' > > # Define constants > hbar = n^(-1.0/(d+4.0)) > hbar2 = hbar^2 > constant = 1.0/(n*hbar^(d) * (2π)^(d/2)) > > # allocate space > density = Array(Float64, n_points) > Di_min = Array(Float64, n_points) > > # apply formula (2) > for i=1:n_points # loop over all points > dens_i = 0.0 > min_di2 = Inf > for j=1:n_points # loop over all other points > d_i2_j = 0.0 > for k=1:d # loop over d > @inbounds d_i2_j += ((points[k, i] - data[k, j])^2) > end > dens_i += myexp(-0.5*d_i2_j/hbar2) > if i != j && d_i2_j < min_di2 > min_di2 = d_i2_j > end > end > density[i] = constant * dens_i > Di_min[i] = sqrt(min_di2) > end > > return density, Di_min > end > > > > To test the performance of this code on linux and OS X, I started up a > docker image with a recent (40 days old master) julia from my OS X machine > and compared the timing against running it on OS X directly (with 1 days > old julia). I found that for `data, points = randn(9500, 2)` on linux > version takes about 2.6 seconds to run `test_func` whereas on OS X it > takes about 9.3. > > I can't explain this large (almost 4x) performance hit that I get from > running the code on the native OS vs the virtual machine. > > More details (profiler results, timing stats, self-contained runnable > example) in the gist: > https://gist.github.com/spencerlyon2/d21d6368a2ccbf6f1e7b > > > >
