I got pretty far on that a few months ago, 
see https://github.com/JuliaLang/julia/pull/6230 
and https://github.com/JuliaLang/julia/issues/6349

A couple of tiny changes aren't in master at the moment, but I was able to 
get libjulia compiled and julia.exe starting system image bootstrap. It hit 
a stack overflow at osutils.jl which is right after inference.jl, so the 
problem is likely in compiling type inference. Apparently I was missing 
some flags that are used in the MinGW build to increase the default stack 
size. Haven't gotten back to giving it another try recently.


On Tuesday, June 17, 2014 12:25:50 PM UTC-7, David Anthoff wrote:
>
> Another interesting result from the paper is how much faster Visual C++ 
> 2010 generated code is than gcc, on Windows. For their example, the gcc 
> runtime is 2.29 the runtime of the MS compiled version. The difference 
> might be even larger with Visual C++ 2013 because that is when MS added an 
> auto-vectorizer that is on by default.
>
>  
>
> I vaguely remember a discussion about compiling julia itself with the MS 
> compiler on Windows, is that working and is that making a performance 
> difference?
>
>  
>
> *From:* [email protected] <javascript:> [mailto:
> [email protected] <javascript:>] *On Behalf Of *Peter Simon
> *Sent:* Tuesday, June 17, 2014 12:08 PM
> *To:* [email protected] <javascript:>
> *Subject:* Re: [julia-users] Benchmarking study: C++ < Fortran < Numba < 
> Julia < Java < Matlab < the rest
>
>  
>
> Sorry, Florian and David, for not seeing that you were way ahead of me.
>
>  
>
> On the subject of the log function:  I tried implementing mylog() as 
> defined by Andreas on Julia running on CentOS and the result was a 
> significant slowdown! (Yes, I defined the mylog function outside of main, 
> at the module level).  Not sure if this is due to variation in the quality 
> of the libm function on various systems or what.  If so, then it makes 
> sense that Julia wants a uniformly accurate and fast implementation via 
> openlibm.  But for fastest transcendental function performance, I assume 
> that one must use the micro-coded versions built into the processor's 
> FPU--Is that what the fast libm implementations do?  In that case, how 
> could one hope to compete when using a C-coded version?
>
>  
>
> --Peter
>
>
>
> On Tuesday, June 17, 2014 10:57:47 AM UTC-7, David Anthoff wrote:
>
> I submitted three pull requests to the original repo that get rid of three 
> different array allocations in loops and that make things a fair bit faster 
> altogether:
>
>  
>
> https://github.com/jesusfv/Comparison-Programming-Languages-Economics/pulls
>
>  
>
> I think it would also make sense to run these benchmarks on julia 0.3.0 
> instead of 0.2.1, given that there have been a fair number of performance 
> imrpovements.
>
>  
>
> *From:* [email protected] [mailto:[email protected]] *On 
> Behalf Of *Florian Oswald
> *Sent:* Tuesday, June 17, 2014 10:50 AM
> *To:* [email protected]
> *Subject:* Re: [julia-users] Benchmarking study: C++ < Fortran < Numba < 
> Julia < Java < Matlab < the rest
>
>  
>
> thanks peter. I made that devectorizing change after dalua suggested so. 
> It made a massive difference!
>
> On Tuesday, 17 June 2014, Peter Simon <[email protected]> wrote:
>
> You're right.  Replacing the NumericExtensions function calls with a small 
> loop
>
>  
>
>         maxDifference  = 0.0
>         for k = 1:length(mValueFunction)
>             maxDifference = max(maxDifference, abs(mValueFunction[k]- 
> mValueFunctionNew[k]))
>         end
>
>
> makes no significant difference in execution time or memory allocation and 
> eliminates the dependency.
>
>  
>
> --Peter
>
>
>
> On Tuesday, June 17, 2014 10:05:03 AM UTC-7, Andreas Noack Jensen wrote:
>
> ...but the Numba version doesn't use tricks like that. 
>
>  
>
> The uniform metric can also be calculated with a small loop. I think that 
> requiring dependencies is against the purpose of the exercise.
>
>  
>
> 2014-06-17 18:56 GMT+02:00 Peter Simon <[email protected]>:
>
> As pointed out by Dahua, there is a lot of unnecessary memory allocation. 
>  This can be reduced significantly by replacing the lines
>
>  
>
>         maxDifference  = maximum(abs(mValueFunctionNew-mValueFunction))
>         mValueFunction    = mValueFunctionNew
>         mValueFunctionNew = zeros(nGridCapital,nGridProductivity)
>
>  
>
>  
>
> with
>
>  
>
>         maxDifference  = maximum(abs!(subtract!(mValueFunction, 
> mValueFunctionNew)))
>         (mValueFunction, mValueFunctionNew) = (mValueFunctionNew, 
> mValueFunction)
>         fill!(mValueFunctionNew, 0.0)
>
>  
>
> abs! and subtract! require adding the line
>
>  
>
> using NumericExtensions
>
>  
>
> prior to the function line.  I think the OP used Julia 0.2; I don't 
> believe that NumericExtensions will work with that old version.  When I 
> combine these changes with adding 
>
>  
>
> @inbounds begin
> ...
> end
>
>  
>
> block around the "while" loop, I get about 25% reduction in execution 
> time, and reduction of memory allocation from roughly 700 MByte to 180 MByte
>
>  
>
> --Peter
>
>
>
> On Tuesday, June 17, 2014 9:32:34 AM UTC-7, John Myles White wrote:
>
> Sounds like we need to rerun these benchmarks after the new GC branch gets 
> updated.
>
>  
>
>  -- John
>
>  
>
> On Jun 17, 2014, at 9:31 AM, Stefan Karpinski <[email protected]> 
> wrote:
>
>  
>
> That definitely smells like a GC issue. Python doesn't have this 
> particular problem since it uses reference counting.
>
>  
>
> On Tue, Jun 17, 2014 at 12:21 PM, Cristóvão Duarte Sousa <[email protected]> 
> wrote:
>
> I've just done measurements of algorithm inner loop times in my machine by 
> changing the code has shown in this commit 
> <https://github.com/cdsousa/Comparison-Programming-Languages-Economics/commit/4f6198ad24adc146c268a1c2eeac14d5ae0f300c>
> .
>
>  
>
> I've found out something... see for yourself:
>
>  
>
> using Winston
> numba_times = readdlm("numba_times.dat")[10:end];
> plot(numba_times)
>
>
> <https://lh6.googleusercontent.com/-m1c6SAbijVM/U6BpmBmFbqI/AAAAAAAADdc/wtxnKuGFDy0/s1600/numba_times.png>
>
> julia_times = readdlm("julia_times.dat")[10:end];
> plot(julia_times)
>
>  
>
>
> <https://lh4.googleusercontent.com/-7iprMnjyZQY/U6Bp8gHVNJI/AAAAAAAADdk/yUgu8RyZ-Kw/s1600/julia_times.png>
>
> println((median(numba_times), mean(numba_times), var(numba_times)))
>
> (0.0028225183486938477,0.0028575707378805993,2.4830103817464292e-8)
>
>  
>
> println((median(julia_times), mean(julia_times), var(julia_times)))
>
> (0.0028240440000000004,0.0034863882123824454,1.7058255003790299e-6)
>
>  
>
> So, while inner loop times have more or less the same median on both Julia 
> and Numba tests, the mean and variance are higher in Julia.
>
>  
>
> Can that be due to the garbage collector being kicking in?
>
>
>
> On Monday, June 16, 2014 4:52:07 PM UTC+1, Florian Oswald wrote:
>
> Dear all,
>
>  
>
> I thought you might find this paper interesting: 
> http://economics.sas.upenn.edu/~jesusfv/comparison_languages.pdf
>
>  
>
> It takes a standard model from macro economics and computes it's solution 
> with an identical algorithm in several languages. Julia is roughly 2.6 
> times slower than the best C++ executable. I was bit puzzled by the result, 
> since in the benchmarks on http://julialang.org/, the slowest test is 
> 1.66 times C. I realize that those benchmarks can't cover all possible 
> situations. That said, I couldn't really find anything unusual in the Julia 
> code, did some profiling and removed type inference, but still that's as 
> fast as I got it. That's not to say that I'm disappointed, I still think 
> this is great. Did I miss something obvious here or is there something 
> specific to this algorithm? 
>
>  
>
> The codes are on github at 
>
>  
>
> https://github.com/jesusfv/Comparison-Programming-Languages-Economics
>
>  
>
>  
>
>  
>
>  
>
>
>
>  
>
> -- 
> Med venlig hilsen
>
> Andreas Noack Jensen
>
>

Reply via email to