Thank you guys. I couldn't imagine how many things could go wrong in a
computation session, under Windows. I rebooted my PC, and now the
benchmarks run 3 times faster (!), and I see no real differences in the
cases, except in the global context.

I agree that annotating "pure" functions could be very useful for high
performance code.

I miss STATIC variables, even more, though. My functions use a bunch of
small constant arrays, which I'd like to specify as static, loaded together
with the function code. Is there a way to do this? (Now I put my functions
into modules, where outside of the functions I write const array
definitions. Inside the functions these arrays are declared global. It
works, but I ended up with many small modules, and a lot of "using"
statements.)


On Fri, Mar 28, 2014 at 12:09 PM, Stefan Karpinski <ste...@karpinski.org>wrote:

> Either way, one thing is quite unfortunate about this code. The
> compilation process isn't able to figure out that 10^8 is a constant so it
> recomputes it on every loop iteration. We really need a way to annotate
> functions as being pure in the very specific sense that the compiler is
> free to evaluate them at compile time if all of its arguments are known at
> compile time (or partially evaluate if when some of the arguments are
> known).
>
>
> On Fri, Mar 28, 2014 at 11:24 AM, John Myles White <
> johnmyleswh...@gmail.com> wrote:
>
>> Yeah, that's true. I didn't read the IR carefully enough.
>>
>> Laszlo, are you on the latest Julia? I worry that it's hard to make
>> comparisons if you're running an older version of Julia.
>>
>>  -- John
>>
>> On Mar 28, 2014, at 8:18 AM, Stefan Karpinski <ste...@karpinski.org>
>> wrote:
>>
>> Perhaps I should have said "isomorphic" - the only differences there are
>> names. It's more obvious that the native code is the same - only the source
>> line annotations are different at all.
>>
>>
>> On Fri, Mar 28, 2014 at 11:16 AM, John Myles White <
>> johnmyleswh...@gmail.com> wrote:
>>
>>> On my system, the two functions produce different LLVM IR:
>>>
>>> julia> code_llvm(f1, ())
>>>
>>> define i64 @julia_f115727() {
>>> top:
>>>   %0 = call i64 @julia_power_by_squaring1373(i64 10, i64 8), !dbg !726
>>>   %1 = icmp slt i64 %0, 1, !dbg !726
>>>   br i1 %1, label %L2, label %if, !dbg !726
>>>
>>> if:                                               ; preds = %top, %if
>>>   %j.04 = phi i64 [ %3, %if ], [ 1, %top ]
>>>   %k.03 = phi i64 [ %4, %if ], [ 1, %top ]
>>>   %2 = and i64 %k.03, 1, !dbg !727
>>>   %3 = add i64 %j.04, %2, !dbg !727
>>>   %4 = add i64 %k.03, 1, !dbg !728
>>>   %5 = call i64 @julia_power_by_squaring1373(i64 10, i64 8), !dbg !726
>>>   %6 = icmp sgt i64 %4, %5, !dbg !726
>>>   br i1 %6, label %L2, label %if, !dbg !726
>>>
>>> L2:                                               ; preds = %if, %top
>>>   %j.0.lcssa = phi i64 [ 1, %top ], [ %3, %if ]
>>>   ret i64 %j.0.lcssa, !dbg !729
>>> }
>>>
>>> julia> code_llvm(f2, ())
>>>
>>> define i64 @julia_f215728() {
>>> top:
>>>   %0 = call i64 @julia_power_by_squaring1373(i64 10, i64 8), !dbg !729
>>>   %1 = icmp slt i64 %0, 1, !dbg !729
>>>   br i1 %1, label %L6, label %L3, !dbg !729
>>>
>>> L3:                                               ; preds = %top, %L3
>>>   %j.08 = phi i64 [ %3, %L3 ], [ 1, %top ]
>>>   %k.07 = phi i64 [ %4, %L3 ], [ 1, %top ]
>>>   %2 = and i64 %k.07, 1, !dbg !730
>>>   %3 = add i64 %j.08, %2, !dbg !730
>>>   %4 = add i64 %k.07, 1, !dbg !731
>>>   %5 = call i64 @julia_power_by_squaring1373(i64 10, i64 8), !dbg !729
>>>   %6 = icmp slt i64 %5, %4, !dbg !729
>>>   br i1 %6, label %L6, label %L3, !dbg !729
>>>
>>> L6:                                               ; preds = %L3, %top
>>>   %j.0.lcssa = phi i64 [ 1, %top ], [ %3, %L3 ]
>>>   ret i64 %j.0.lcssa, !dbg !732
>>> }
>>>
>>> But the performance is identical or slightly in favor of f1.
>>>
>>>  -- John
>>>
>>> On Mar 28, 2014, at 8:02 AM, Stefan Karpinski <ste...@karpinski.org>
>>> wrote:
>>>
>>> > Both way of writing a while loop should be the same. If you're seeing
>>> a difference, something else is going on. I'm not able to reproduce this:
>>> >
>>> > function f1()
>>> >   j = k = 1
>>> >   while k <= 10^8
>>> >     j += k & 1
>>> >     k += 1
>>> >   end
>>> >   return j
>>> > end
>>> >
>>> > function f2()
>>> >   j = k = 1
>>> >   while true
>>> >     k <= 10^8 || break
>>> >     j += k & 1
>>> >     k += 1
>>> >   end
>>> >   return j
>>> > end
>>> >
>>> > function f3()
>>> >   j = k = 1
>>> >   while true
>>> >     k > 10^8 && break
>>> >     j += k & 1
>>> >     k += 1
>>> >   end
>>> >   return j
>>> > end
>>> >
>>> > julia> @time f1()
>>> > elapsed time: 0.644661304 seconds (64 bytes allocated)
>>> > 50000001
>>> >
>>> > julia> @time f2()
>>> > elapsed time: 0.640951585 seconds (64 bytes allocated)
>>> > 50000001
>>> >
>>> > julia> @time f3()
>>> > elapsed time: 0.639177183 seconds (64 bytes allocated)
>>> > 50000001
>>> >
>>> > All three functions produce identical native code. Can you send
>>> exactly what your function definitions are, how you're timing them and
>>> perhaps the output of code_native(f1,())?
>>> >
>>> >
>>> > On Fri, Mar 28, 2014 at 10:48 AM, Laszlo Hars <laszloh...@gmail.com>
>>> wrote:
>>> > Thanks, John, for your replies. In my system your code gives reliable
>>> results, too, if we increase the loop limits to 10^9:
>>> >
>>> > julia> mean(t1s ./ t2s)
>>> > 11.924373323658703
>>> >
>>> > This 12% makes a significant difference in my function of nested loops
>>> (could add up to a factor of 2 slow down). So, the question remains:
>>> >
>>> > - what is the fastest coding of a while loop?
>>> >
>>>
>>>
>>
>>
>

Reply via email to