This macro:
macro clenshaw(x, c...)
bk1,bk2 = :(zero(t)),:(zero(t))
N = length(c)
for k = N:-1:2
bk2, bk1 = bk1, :(muladd(t,$bk1,$(esc(c[k]))-$bk2))
end
ex = :(muladd(t/2,$bk1,$(esc(c[1]))-$bk2))
Expr(:block, :(t = $(esc(2))*$(esc(x))), ex)
end
implements Clenshaw's algorithm to sum Chebyshev series. It successfully
"unrolls" the loop, but is impractical for more than 24 coefficients. The
resulting LLVM code is theoretically only 50% longer than unrolling
Horner's rule:
f(x) =
@evalpoly(x,1.0,1/2,1/3,1/4,1/5,1/6,1/7,1/8,1/9,1/10,1/11,1/12,1/13,1/14,1/15,1/16,1/17,1/18,1/19,1/20)
@code_llvm f(1.0)
g(x) =
@clenshaw(x,1.0,1/2,1/3,1/4,1/5,1/6,1/7,1/8,1/9,1/10,1/11,1/12,1/13,1/14,1/15,1/16,1/17,1/18,1/19,1/20)
@code_llvm g(1.0)
How could I write the macro differently? How else could I end up with the
same efficient LLVM code? using a staged function?