I agree it works, but Julia crashes/can't compile with 25 coefficients:

@clenshaw(x,1.0,1/2,1/3,1/4,1/5,1/6,1/7,1/8,1/9,1/10,1/11,1/12,1/13,1/14,1/15,1/16,1/17,1/18,1/19,1/20,1/21,1/22,1/23,1/24,1/25)

The compilation time is also exorbitant for 24 coefficients:

julia> g(x) = 
@clenshaw(x,1.0,1/2,1/3,1/4,1/5,1/6,1/7,1/8,1/9,1/10,1/11,1/12,1/13,1/14,1/15,1/16,1/17,1/18,1/19,1/20,1/21,1/22,1/23,1/24)

g (generic function with 1 method)


julia> @time g(1.0)

 39.776082 seconds (131.13 M allocations: 3.984 GB, 7.63% gc time)

3.775958177753509


julia> @time g(1.0)

  0.000001 seconds (5 allocations: 176 bytes)

3.775958177753509



On Thursday, January 14, 2016 at 3:06:15 PM UTC, Stefan Karpinski wrote:
>
> I get this LLVM code:
>
> ulia> @code_llvm g(1.0)
>
> define double @julia_g_23965(double) {
> top:
>   %1 = fmul double %0, 2.000000e+00
>   %2 = fmul double %1, 5.000000e-01
>   %3 = fmul fast double %0, 1.000000e-01
>   %4 = fadd fast double %3, 0x3FAAF286BCA1AF28
>   %5 = fmul fast double %1, %4
>   %6 = fadd fast double %5, 0x3F76C16C16C16C10
>   %7 = fsub double 0x3FAE1E1E1E1E1E1E, %4
>   %8 = fmul fast double %1, %6
>   %9 = fadd fast double %7, %8
>   %10 = fsub double 6.250000e-02, %6
>   %11 = fmul fast double %1, %9
>   %12 = fadd fast double %10, %11
>   %13 = fsub double 0x3FB1111111111111, %9
>   %14 = fmul fast double %1, %12
>   %15 = fadd fast double %13, %14
>   %16 = fsub double 0x3FB2492492492492, %12
>   %17 = fmul fast double %1, %15
>   %18 = fadd fast double %16, %17
>   %19 = fsub double 0x3FB3B13B13B13B14, %15
>   %20 = fmul fast double %1, %18
>   %21 = fadd fast double %19, %20
>   %22 = fsub double 0x3FB5555555555555, %18
>   %23 = fmul fast double %1, %21
>   %24 = fadd fast double %22, %23
>   %25 = fsub double 0x3FB745D1745D1746, %21
>   %26 = fmul fast double %1, %24
>   %27 = fadd fast double %25, %26
>   %28 = fsub double 1.000000e-01, %24
>   %29 = fmul fast double %1, %27
>   %30 = fadd fast double %28, %29
>   %31 = fsub double 0x3FBC71C71C71C71C, %27
>   %32 = fmul fast double %1, %30
>   %33 = fadd fast double %31, %32
>   %34 = fsub double 1.250000e-01, %30
>   %35 = fmul fast double %1, %33
>   %36 = fadd fast double %34, %35
>   %37 = fsub double 0x3FC2492492492492, %33
>   %38 = fmul fast double %1, %36
>   %39 = fadd fast double %37, %38
>   %40 = fsub double 0x3FC5555555555555, %36
>   %41 = fmul fast double %1, %39
>   %42 = fadd fast double %40, %41
>   %43 = fsub double 2.000000e-01, %39
>   %44 = fmul fast double %1, %42
>   %45 = fadd fast double %43, %44
>   %46 = fsub double 2.500000e-01, %42
>   %47 = fmul fast double %1, %45
>   %48 = fadd fast double %46, %47
>   %49 = fsub double 0x3FD5555555555555, %45
>   %50 = fmul fast double %1, %48
>   %51 = fadd fast double %49, %50
>   %52 = fsub double 5.000000e-01, %48
>   %53 = fmul fast double %1, %51
>   %54 = fadd fast double %52, %53
>   %55 = fsub double 1.000000e+00, %51
>   %56 = fmul fast double %2, %54
>   %57 = fadd fast double %55, %56
>   ret double %57
> }
>
>
> Is that not what you were hoping for?
>
> On Thu, Jan 14, 2016 at 8:09 AM, <[email protected] <javascript:>
> > wrote:
>
>> This macro:
>>
>> macro clenshaw(x, c...)
>>     bk1,bk2 = :(zero(t)),:(zero(t))
>>     N = length(c)
>>     for k = N:-1:2
>>         bk2, bk1 = bk1, :(muladd(t,$bk1,$(esc(c[k]))-$bk2))
>>     end
>>     ex = :(muladd(t/2,$bk1,$(esc(c[1]))-$bk2))
>>     Expr(:block, :(t = $(esc(2))*$(esc(x))), ex)
>> end
>>
>> implements Clenshaw's algorithm to sum Chebyshev series. It successfully 
>> "unrolls" the loop, but is impractical for more than 24 coefficients. The 
>> resulting LLVM code is theoretically only 50% longer than unrolling 
>> Horner's rule:
>>
>> f(x) = 
>> @evalpoly(x,1.0,1/2,1/3,1/4,1/5,1/6,1/7,1/8,1/9,1/10,1/11,1/12,1/13,1/14,1/15,1/16,1/17,1/18,1/19,1/20)
>>
>> @code_llvm f(1.0)
>>
>> g(x) = 
>> @clenshaw(x,1.0,1/2,1/3,1/4,1/5,1/6,1/7,1/8,1/9,1/10,1/11,1/12,1/13,1/14,1/15,1/16,1/17,1/18,1/19,1/20)
>>
>> @code_llvm g(1.0)
>>
>> How could I write the macro differently? How else could I end up with the 
>> same efficient LLVM code? using a staged function?
>>
>
>

Reply via email to