I agree it works, but Julia crashes/can't compile with 25 coefficients:
@clenshaw(x,1.0,1/2,1/3,1/4,1/5,1/6,1/7,1/8,1/9,1/10,1/11,1/12,1/13,1/14,1/15,1/16,1/17,1/18,1/19,1/20,1/21,1/22,1/23,1/24,1/25)
The compilation time is also exorbitant for 24 coefficients:
julia> g(x) =
@clenshaw(x,1.0,1/2,1/3,1/4,1/5,1/6,1/7,1/8,1/9,1/10,1/11,1/12,1/13,1/14,1/15,1/16,1/17,1/18,1/19,1/20,1/21,1/22,1/23,1/24)
g (generic function with 1 method)
julia> @time g(1.0)
39.776082 seconds (131.13 M allocations: 3.984 GB, 7.63% gc time)
3.775958177753509
julia> @time g(1.0)
0.000001 seconds (5 allocations: 176 bytes)
3.775958177753509
On Thursday, January 14, 2016 at 3:06:15 PM UTC, Stefan Karpinski wrote:
>
> I get this LLVM code:
>
> ulia> @code_llvm g(1.0)
>
> define double @julia_g_23965(double) {
> top:
> %1 = fmul double %0, 2.000000e+00
> %2 = fmul double %1, 5.000000e-01
> %3 = fmul fast double %0, 1.000000e-01
> %4 = fadd fast double %3, 0x3FAAF286BCA1AF28
> %5 = fmul fast double %1, %4
> %6 = fadd fast double %5, 0x3F76C16C16C16C10
> %7 = fsub double 0x3FAE1E1E1E1E1E1E, %4
> %8 = fmul fast double %1, %6
> %9 = fadd fast double %7, %8
> %10 = fsub double 6.250000e-02, %6
> %11 = fmul fast double %1, %9
> %12 = fadd fast double %10, %11
> %13 = fsub double 0x3FB1111111111111, %9
> %14 = fmul fast double %1, %12
> %15 = fadd fast double %13, %14
> %16 = fsub double 0x3FB2492492492492, %12
> %17 = fmul fast double %1, %15
> %18 = fadd fast double %16, %17
> %19 = fsub double 0x3FB3B13B13B13B14, %15
> %20 = fmul fast double %1, %18
> %21 = fadd fast double %19, %20
> %22 = fsub double 0x3FB5555555555555, %18
> %23 = fmul fast double %1, %21
> %24 = fadd fast double %22, %23
> %25 = fsub double 0x3FB745D1745D1746, %21
> %26 = fmul fast double %1, %24
> %27 = fadd fast double %25, %26
> %28 = fsub double 1.000000e-01, %24
> %29 = fmul fast double %1, %27
> %30 = fadd fast double %28, %29
> %31 = fsub double 0x3FBC71C71C71C71C, %27
> %32 = fmul fast double %1, %30
> %33 = fadd fast double %31, %32
> %34 = fsub double 1.250000e-01, %30
> %35 = fmul fast double %1, %33
> %36 = fadd fast double %34, %35
> %37 = fsub double 0x3FC2492492492492, %33
> %38 = fmul fast double %1, %36
> %39 = fadd fast double %37, %38
> %40 = fsub double 0x3FC5555555555555, %36
> %41 = fmul fast double %1, %39
> %42 = fadd fast double %40, %41
> %43 = fsub double 2.000000e-01, %39
> %44 = fmul fast double %1, %42
> %45 = fadd fast double %43, %44
> %46 = fsub double 2.500000e-01, %42
> %47 = fmul fast double %1, %45
> %48 = fadd fast double %46, %47
> %49 = fsub double 0x3FD5555555555555, %45
> %50 = fmul fast double %1, %48
> %51 = fadd fast double %49, %50
> %52 = fsub double 5.000000e-01, %48
> %53 = fmul fast double %1, %51
> %54 = fadd fast double %52, %53
> %55 = fsub double 1.000000e+00, %51
> %56 = fmul fast double %2, %54
> %57 = fadd fast double %55, %56
> ret double %57
> }
>
>
> Is that not what you were hoping for?
>
> On Thu, Jan 14, 2016 at 8:09 AM, <[email protected] <javascript:>
> > wrote:
>
>> This macro:
>>
>> macro clenshaw(x, c...)
>> bk1,bk2 = :(zero(t)),:(zero(t))
>> N = length(c)
>> for k = N:-1:2
>> bk2, bk1 = bk1, :(muladd(t,$bk1,$(esc(c[k]))-$bk2))
>> end
>> ex = :(muladd(t/2,$bk1,$(esc(c[1]))-$bk2))
>> Expr(:block, :(t = $(esc(2))*$(esc(x))), ex)
>> end
>>
>> implements Clenshaw's algorithm to sum Chebyshev series. It successfully
>> "unrolls" the loop, but is impractical for more than 24 coefficients. The
>> resulting LLVM code is theoretically only 50% longer than unrolling
>> Horner's rule:
>>
>> f(x) =
>> @evalpoly(x,1.0,1/2,1/3,1/4,1/5,1/6,1/7,1/8,1/9,1/10,1/11,1/12,1/13,1/14,1/15,1/16,1/17,1/18,1/19,1/20)
>>
>> @code_llvm f(1.0)
>>
>> g(x) =
>> @clenshaw(x,1.0,1/2,1/3,1/4,1/5,1/6,1/7,1/8,1/9,1/10,1/11,1/12,1/13,1/14,1/15,1/16,1/17,1/18,1/19,1/20)
>>
>> @code_llvm g(1.0)
>>
>> How could I write the macro differently? How else could I end up with the
>> same efficient LLVM code? using a staged function?
>>
>
>