This macro:

macro clenshaw(x, c...)
    bk1,bk2 = :(zero(t)),:(zero(t))
    N = length(c)
    for k = N:-1:2
        bk2, bk1 = bk1, :(muladd(t,$bk1,$(esc(c[k]))-$bk2))
    end
    ex = :(muladd(t/2,$bk1,$(esc(c[1]))-$bk2))
    Expr(:block, :(t = $(esc(2))*$(esc(x))), ex)
end

implements Clenshaw's algorithm to sum Chebyshev series. It successfully 
"unrolls" the loop, but is impractical for more than 24 coefficients. The 
resulting LLVM code is theoretically only 50% longer than unrolling 
Horner's rule:

f(x) = 
@evalpoly(x,1.0,1/2,1/3,1/4,1/5,1/6,1/7,1/8,1/9,1/10,1/11,1/12,1/13,1/14,1/15,1/16,1/17,1/18,1/19,1/20)

@code_llvm f(1.0)

g(x) = 
@clenshaw(x,1.0,1/2,1/3,1/4,1/5,1/6,1/7,1/8,1/9,1/10,1/11,1/12,1/13,1/14,1/15,1/16,1/17,1/18,1/19,1/20)

@code_llvm g(1.0)

How could I write the macro differently? How else could I end up with the 
same efficient LLVM code? using a staged function?

Reply via email to