Ok, but thanks a lot for directing me around! Now at least I can disable Hyperthreading and stuff runs fast...
cheers, Johannes On Wednesday, April 6, 2016 at 5:48:07 PM UTC+2, Milan Bouchet-Valat wrote: > > Le mercredi 06 avril 2016 à 17:02 +0200, Johannes Wagner a écrit : > > > > > > > > On 6 Apr 2016, at 4:46 PM, Milan Bouchet-Valat <[email protected] > <javascript:>> wrote: > > > > > > Le mercredi 06 avril 2016 à 07:25 -0700, Johannes Wagner a écrit : > > > > > > > > and last update, I disabled Hyperthreading on the i7, and now it > > > > performes as expected. > > > > > > > > i7 no HT: > > > > > > > > 1000000 loops, best of 3: 879.57 ns per loop > > > > 100000 loops, best of 3: 9.88 µs per loop > > > > 100 loops, best of 3: 4.46 ms per loop > > > > 10000 loops, best of 3: 69.89 µs per loop > > > > 10000 loops, best of 3: 26.67 µs per loop > > > > 10 loops, best of 3: 95.08 ms per loop > > > > > > > > i7 with HT: > > > > > > > > 1000000 loops, best of 3: 871.68 ns per loop > > > > 10000 loops, best of 3: 10.84 µs per loop > > > > 100 loops, best of 3: 5.19 ms per loop > > > > 10000 loops, best of 3: 71.35 µs per loop > > > > 10000 loops, best of 3: 26.65 µs per loop > > > > 1 loops, best of 3: 159.99 ms per loop > > > > > > > > So all calls inside the loop are the same speed, but the whole loop, > > > > with identical assembly code is ~60% slower if HT is enabled. Where > > > > can this problem then arise from? LLVM? or thread pinning in the OS? > > > > Probably not a julia problem then... > > > Indeed, in the last assembly output you sent, there are no differences > > > between i5 and i7 (as expected). So this isn't Julia's nor LLVM's > > > fault. No idea whether there might be an issue with the CPU itself, > but > > > it's quite surprising. > > Run it on a 2nd i7 machine. Same behavior, so definitely not a faulty > > cpu. Do you have any other idea what to do? Just leave it as is and > > now use julia with disabled hyper threading is not really > > satisfactory... > Sorry, I'm clueless. You could ask on forums dedicated to CPUs or on > Intel forums. You could also try with a different OS, just in case. > > > > Regards > > > > > > > Regards > > > > > > > > > > > > > > > > > > > > Le mardi 05 avril 2016 à 10:18 -0700, Johannes Wagner a écrit : > > > > > > > > > > > > hey Milan, > > > > > > so consider following code: > > > > > > > > > > > > Pkg.clone("git://github.com/kbarbary/TimeIt.jl.git") > > > > > > using TimeIt > > > > > > > > > > > > v = rand(3) > > > > > > r = rand(6000,3) > > > > > > x = linspace(0.0, 10.0, 500) * (v./sqrt(sumabs2(v)))' > > > > > > > > > > > > dotprods = r * x[2,:] > > > > > > imexp = cis(dotprods) > > > > > > sumprod = sum(imexp) * sum(conj(imexp)) > > > > > > > > > > > > f(r, x) = r * x[2,:] > > > > > > g(r, x) = r * x' > > > > > > h(imexp) = sum(imexp) * sum(conj(imexp)) > > > > > > > > > > > > function s(r, x) > > > > > > result = zeros(size(x,1)) > > > > > > for i = 1:size(x,1) > > > > > > imexp = cis(r * x[i,:]) > > > > > > result[i]= sum(imexp) * sum(conj(imexp)) > > > > > > end > > > > > > return result > > > > > > end > > > > > > > > > > > > @timeit zeros(size(x,1)) > > > > > > @timeit f(r,x) > > > > > > @timeit g(r,x) > > > > > > @timeit cis(dotprods) > > > > > > @timeit h(imexp) > > > > > > @timeit s(r,x) > > > > > > > > > > > > @code_native f(r,x) > > > > > > @code_native g(r,x) > > > > > > @code_native cis(dotprods) > > > > > > @code_native h(imexp) > > > > > > @code_native s(r,x) > > > > > > > > > > > > and I attached the output of the last @code_native s(r,x) as > > > > > text > > > > > > > > > > > > files for the binary tarball, as well as the latest nalimilan > > > > > update. > > > > > > > > > > > > For the whole function s, the exported code looks actually the > > > > > same > > > > > > > > > > > > everywhere. > > > > > > But s(r,x) is the one that is considerable slower on the i7 than > > > > > the > > > > > > > > > > > > i5, whereas all the other timed calls are more or less same > speed > > > > > on > > > > > > > > > > > > i5 and i7. Here are the timings in the same order as above (all > > > > > run > > > > > > > > > > > > repeatedly to not have compile time in it for last one): > > > > > > > > > > > > i7: > > > > > > 1000000 loops, best of 3: 871.68 ns per loop > > > > > > 10000 loops, best of 3: 10.84 µs per loop > > > > > > 100 loops, best of 3: 5.19 ms per loop > > > > > > 10000 loops, best of 3: 71.35 µs per loop > > > > > > 10000 loops, best of 3: 26.65 µs per loop > > > > > > 1 loops, best of 3: 159.99 ms per loop > > > > > > > > > > > > i5: > > > > > > 100000 loops, best of 3: 1.01 µs per loop > > > > > > 10000 loops, best of 3: 10.93 µs per loop > > > > > > 100 loops, best of 3: 5.09 ms per loop > > > > > > 10000 loops, best of 3: 75.93 µs per loop > > > > > > 10000 loops, best of 3: 29.23 µs per loop > > > > > > 1 loops, best of 3: 103.70 ms per loop > > > > > > > > > > > > So based on inside s(r,x) calls, the i7 should be faster, but > > > > > the > > > > > > > > > > > > whole s(r,x) is slower. Still clueless... And don't know how to > > > > > > further pin this down... > > > > > Thanks. I think you got mixed up with the different files, as the > > > > > versioninfo() output indicates. Anyway, there's enough info to > > > > > check > > > > > which file corresponds to which Julia version, so that's OK. > > > > > Indeed, > > > > > when comparing the tests with binary tarballs, there's a call > > > > > to jl_alloc_array_1d with the i7 (julia050_tarball-haswell- > > > > > i7.txt), > > > > > which is not present with the i5 (incorrectly > > > > > named julia050_haswell- > > > > > i7.txt). This is really unexpected. > > > > > > > > > > Could you file an issue on GitHub with a summary of what we've > > > > > found > > > > > (essentially your message), as well as links to 3 Gists giving the > > > > > code > > > > > and the contents of the two .txt files I mentioned above? That > > > > > would be > > > > > very helpful. Do not mention the Fedora packages at all, as the > > > > > binary > > > > > tarballs are closer to what Julia developers use. > > > > > > > > > > > > > > > Regards > > > > > > > > > > > > > > > > > > > > > > cheers, Johannes > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Le lundi 04 avril 2016 à 10:36 -0700, Johannes Wagner a > > > > > écrit : > > > > > > > > > > > > > > > > > > > > > > > > > > > > > hey guys, > > > > > > > > so attached you find text files with @code_native output for > > > > > the > > > > > > > > > > > > > > > > > > > > > > > > > > > > > instructions > > > > > > > > - r * x[1,:] > > > > > > > > - cis(imexp) > > > > > > > > - sum(imexp) * sum(conj(imexp)) > > > > > > > > > > > > > > > > for julia 0.5. > > > > > > > > > > > > > > > > Hardware I run on is a Haswell i5 machine, a Haswell i7 > > > > > machine, > > > > > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > a IvyBridge i5 machine. Turned out on an Haswell i5 machine > > > > > the > > > > > > > > > > > > > > > > > > > > code > > > > > > > > > > > > > > > > also runs fast. Only the Haswell i7 machine is the slow > one. > > > > > > > This > > > > > > > > > > > > > > > > really drove me nuts. First I thought it was the OS, then > > > > > the > > > > > > > > > > > > > > > > > > > > > > > > > > > > > architecture, and now its just from i5 to i7.... Anyways, I > > > > > > > don't > > > > > > > > > > > > > > > > know anything about x86 assembly, but the julia 0.45 code is > > > > > the > > > > > > > > > > > > > > > > > > > > same > > > > > > > > > > > > > > > > on all machines. However, for the dot product, the 0.5 code > > > > > has > > > > > > > > > > > > > > > > > > > > > > > > > > > > > already 2 different instructions on the i5 vs. the i7 (line > > > > > > > 44&47). > > > > > > > > > > > > > > > > For the cis call also (line 149...). And the IvyBridge i5 > > > > > code > > > > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > similar to the Haswell i5. I included also versioninfo() at > > > > > the > > > > > > > > > > > > > > > > > > > > top > > > > > > > > > > > > > > > > of the file. So you could just look at a vimdiff of the > > > > > julia0.5 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > files... Can anyone make sense out of this? > > > > > > > I'm definitely not an expert in assembly, but that additional > > > > > leaq > > > > > > > > > > > > > > > > > > > > instruction on line 44, and the additional movq instructions > > > > > on > > > > > > > > > > > > > > > > > > > > line > > > > > > > 111, 151 and 152 really look weird > > > > > > > > > > > > > > Could you do the same test with the binary tarballs? If the > > > > > > > difference > > > > > > > persists, you should open an issue on GitHub to track this. > > > > > > > > > > > > > > BTW, please wrap the fist call in a function to ensure it is > > > > > > > specialized for the arguments types, i.e.: > > > > > > > > > > > > > > f(r, x) = r * x[1,:] > > > > > > > @code_native f(r, x) > > > > > > > > > > > > > > Also, please check whether you still see the difference with > > > > > this > > > > > > > > > > > > > > > > > > > > code: > > > > > > > g(r, x) = r * x > > > > > > > @code_native g(r, x[1,:]) > > > > > > > > > > > > > > What are the types of r and x? Could you provide a simple > > > > > > > reproducible example with dummy values? > > > > > > > > > > > > > > > > > > > > > > > The binary tarballs I will still test. If I remove the > cis() > > > > > > > call, > > > > > > > > > > > > > > > > the difference is hard to tell, the loop is ~10times faster > > > > > and > > > > > > > > > > > > > > > > > > > > more > > > > > > > > > > > > > > > > or less all around 5ms. For the whole loop with cis() call, > > > > > from > > > > > > > > > > > > > > > > > > > > i5 > > > > > > > > > > > > > > > > to i7 the difference is ~ 50ms on i5 to 90ms on i7. > > > > > > > > > > > > > > > > Shall I also post the julia 0.4 code? > > > > > > > If it's identical for all machines, I don't think it's > > > > > needed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > cheers, Johannes > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Le mercredi 30 mars 2016 à 15:16 -0700, Johannes Wagner a > > > > > > > écrit : > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Le mercredi 30 mars 2016 à 04:43 -0700, Johannes > Wagner > > > > > a > > > > > > > > > > > > > > > > > > > > écrit : > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sorry for not having expressed myself clearly, I > > > > > meant > > > > > > > > > > > > > > > > > > > > the latest > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > version of fedora to work fine (24 development). I > > > > > always > > > > > > > > > > > > > > > > > > > > used the > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > latest julia nightly available on the copr > nalimilan > > > > > > > repo. Right now > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > that is: 0.5.0-dev+3292, Commit 9d527c5*, all > use > > > > > > > > > > > > LLVM: libLLVM-3.7.1 (ORCJIT, haswell) > > > > > > > > > > > > > > > > > > > > > > > > peakflops on all machines (hardware identical) is > > > > > > > ~1.2..1.5e11. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Fedora 22&23 with julia 0.5 is ~50% slower then 0.4, > > > > > only > > > > > > > > > > > > > > > > > > > > on fedora > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 24 julia 0.5 is faster compared to julia 0.4. > > > > > > > > > > > Could you try to find a simple code to reproduce the > > > > > > > problem? In > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > particular, it would be useful to check whether this > > > > > comes > > > > > > > > > > > > > > > > > > > > from > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > OpenBLAS differences or whether it also happens with > > > > > pure > > > > > > > > > > > > > > > > > > > > Julia code > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (typical operations which depend on BLAS are matrix > > > > > > > multiplication, as > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > well as most of linear algebra). Normally, 0.4 and > 0.5 > > > > > > > should use the > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > same BLAS, but who knows... > > > > > > > > > > well thats what I did, and the 3 simple calls inside the > > > > > loop > > > > > > > > > > > > > > > > > > > > are > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > more or less same speed. only the whole loop seems > > > > > slower. > > > > > > > > > > > > > > > > > > > > See my > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > code sample fromanswer march 8th (code gets in same > > > > > > > proportions > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > faster when exp(im .* dotprods) is replaced by > > > > > cis(dotprods) > > > > > > > > > > > > > > > > > > > > ). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So I don't know what I can do then... > > > > > > > > > Sorry, somehow I had missed that message. This indeed > > > > > looks > > > > > > > > > > > > > > > > > > > > like a code > > > > > > > > > > > > > > > > > > > > > > > > > > generation issue in Julia/LLVM. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Can you also confirm that all versioninfo() fields are > > > > > the > > > > > > > > > > > > > > > > > > > > same for all > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > three machines, both for 0.4 and 0.5? We must envision > > > > > the > > > > > > > > > > > > > > > > > > > > possibility > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > that the differences actually come from 0.4. > > > > > > > > > > ohoh, right! just noticed that my fedora 24 machine was > > > > > an > > > > > > > > > > > > > > > > > > > > ivy bridge > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > which works fast: > > > > > > > > > > > > > > > > > > > > Julia Version 0.5.0-dev+3292 > > > > > > > > > > Commit 9d527c5* (2016-03-28 06:55 UTC) > > > > > > > > > > Platform Info: > > > > > > > > > > System: Linux (x86_64-redhat-linux) > > > > > > > > > > CPU: Intel(R) Core(TM) i5-3550 CPU @ 3.30GHz > > > > > > > > > > WORD_SIZE: 64 > > > > > > > > > > BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY > > > > > Sandybridge) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > LAPACK: libopenblasp.so.0 > > > > > > > > > > LIBM: libopenlibm > > > > > > > > > > LLVM: libLLVM-3.7.1 (ORCJIT, ivybridge) > > > > > > > > > > > > > > > > > > > > and the other ones with fed22/23 are haswell, which > work > > > > > > > slow: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Julia Version 0.5.0-dev+3292 > > > > > > > > > > Commit 9d527c5* (2016-03-28 06:55 UTC) > > > > > > > > > > Platform Info: > > > > > > > > > > System: Linux (x86_64-redhat-linux) > > > > > > > > > > CPU: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz > > > > > > > > > > WORD_SIZE: 64 > > > > > > > > > > BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY > Haswell) > > > > > > > > > > LAPACK: libopenblasp.so.0 > > > > > > > > > > LIBM: libopenlibm > > > > > > > > > > LLVM: libLLVM-3.7.1 (ORCJIT, haswell) > > > > > > > > > > > > > > > > > > > > I just booted an fedora 23 on the ivy bridge machine and > > > > > it's > > > > > > > > > > > > > > > > > > > > also fast. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Now if I use julia 0.45 on both architectures: > > > > > > > > > > > > > > > > > > > > Julia Version 0.4.5 > > > > > > > > > > Commit 2ac304d* (2016-03-18 00:58 UTC) > > > > > > > > > > Platform Info: > > > > > > > > > > System: Linux (x86_64-redhat-linux) > > > > > > > > > > CPU: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz > > > > > > > > > > WORD_SIZE: 64 > > > > > > > > > > BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY > Haswell) > > > > > > > > > > LAPACK: libopenblasp.so.0 > > > > > > > > > > LIBM: libopenlibm > > > > > > > > > > LLVM: libLLVM-3.3 > > > > > > > > > > > > > > > > > > > > and: > > > > > > > > > > > > > > > > > > > > Julia Version 0.4.5 > > > > > > > > > > Commit 2ac304d* (2016-03-18 00:58 UTC) > > > > > > > > > > Platform Info: > > > > > > > > > > System: Linux (x86_64-redhat-linux) > > > > > > > > > > CPU: Intel(R) Core(TM) i5-3550 CPU @ 3.30GHz > > > > > > > > > > WORD_SIZE: 64 > > > > > > > > > > BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY > > > > > Sandybridge) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > LAPACK: libopenblasp.so.0 > > > > > > > > > > LIBM: libopenlibm > > > > > > > > > > LLVM: libLLVM-3.3 > > > > > > > > > > > > > > > > > > > > there is no speed difference apart from the ~10% or so > > > > > from > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > faster haswell machine. So could perhaps be haswell > > > > > hardware > > > > > > > > > > > > > > > > > > > > target > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > specific with the change from llvm 3.3 to 3.7.1? Is > > > > > there > > > > > > > > > > > > > > > > > > > > anything > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > else I could provide? > > > > > > > > > This is certainly an interesting finding. Could you paste > > > > > > > somewhere the > > > > > > > > > > > > > > > > > > > > > > > > > > output of @code_native for your function on Sandybridge > > > > > vs. > > > > > > > > > > > > > > > > > > > > Haswell, > > > > > > > > > > > > > > > > > > > > > > > > > > for both 0.4 and 0.5? > > > > > > > > > > > > > > > > > > It would also be useful to check whether the same > > > > > difference > > > > > > > > > > > > > > > > > > > > appears if > > > > > > > > > > > > > > > > > > > > > > > > > > you use the generic binary tarballs from > http://julialang.o > > > > > rg/d > > > > > > > > > > > > > > > > > > > > ownloads > > > > > > > > > > > > > > > > > > > > > > > > > > . > > > > > > > > > > > > > > > > > > Finally, do you get the same result if you remove the call > > > > > to > > > > > > > > > > > > > > > > > > > > exp() > > > > > > > > > > > > > > > > > > > > > > > > > > from the loop? (This is the only external function, so it > > > > > > > shouldn't be > > > > > > > > > > > > > > > > > > > > > > > > > > affected by changes in Julia.) > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, Johannes > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Le mercredi 16 mars 2016 à 09:25 -0700, Johannes > > > > > Wagner a > > > > > > > > > > > > > > > > > > > > écrit : > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > just a little update. Tested some other fedoras: > > > > > Fedora > > > > > > > > > > > > > > > > > > > > 22 with llvm > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3.8 is also slow with julia 0.5, whereas a fedora > > > > > 24 > > > > > > > > > > > > > > > > > > > > branch with llvm > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3.7 is faster on julia 0.5 compared to julia 0.4, > > > > > as it > > > > > > > > > > > > > > > > > > > > should be > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (speedup from inner loop parts translated into > > > > > speedup > > > > > > > > > > > > > > > > > > > > to whole > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > function). > > > > > > > > > > > > > > > > > > > > > > > > > > don't know if anyone cares about that... At least > > > > > the > > > > > > > > > > > > > > > > > > > > latest version > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > seems to work fine, hope it stays like this into > > > > > the > > > > > > > > > > > > > > > > > > > > final fedora 24 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > What's the "latest version"? git built from source > or > > > > > RPM > > > > > > > > > > > > > > > > > > > > nightlies? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > With which LLVM version for each? > > > > > > > > > > > > > > > > > > > > > > > > If from the RPMs, I've switched them to LLVM 3.8 for > > > > > a > > > > > > > > > > > > > > > > > > > > few days, and > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > went back to 3.7 because of a build failure. So > that > > > > > > > might explain the > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > difference. You can install the last version which > > > > > built > > > > > > > > > > > > > > > > > > > > with LLVM 3.8 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > manually from here: > > > > > > > > > > > > > https://copr-be.cloud.fedoraproject.org/results/nalim > > > > > ilan > > > > > > > > > > > > > > > > > > > > /julia-nightlies/fedora-23-x86_64/00167549-julia/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It would be interesting to compare it with the > > > > > latest > > > > > > > > > > > > > > > > > > > > nightly with 3.7. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > hey guys, > > > > > > > > > > > > > > I just experienced something weird. I have some > > > > > code > > > > > > > > > > > > > > > > > > > > that runs fine > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > on 0.43, then I updated to 0.5dev to test the > > > > > new > > > > > > > > > > > > > > > > > > > > Arrays, run same > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > code and noticed it got about ~50% slower. Then > > > > > I > > > > > > > > > > > > > > > > > > > > downgraded back > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > to 0.43, ran the old code, but speed remained > > > > > slow. I > > > > > > > > > > > > > > > > > > > > noticed while > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > reinstalling 0.43, openblas-threads didn't get > > > > > > > isntalled along with > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > it. So I manually installed it, but no > > > > > change. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Does anyone has an idea what could be going on? > > > > > LLVM > > > > > > > > > > > > > > > > > > > > on fedora23 is > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3.7 > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, Johannes > > > > > > > > > > > > > > >
