Could you please explain why the iterator version is so much faster? Is it simply from avoiding temporary array allocation?
Thanks, --Peter On Friday, August 22, 2014 7:53:59 AM UTC-7, Rafael Fourquet wrote: > > We'd like to eventually be able to do stream fusion to make the vectorized >> version as efficient as the manually fused version, but for now there's a >> performance gap. >> > > It is also not too difficult to implement a fused version via iterators, > eg: > > immutable iabs{X} > x::X > end > > Base.start(i::iabs) = start(i.x) > Base.next(i::iabs, s) = ((v, s) = next(i.x, s); (abs(v), s)) > Base.done(i::iabs, s) = done(i.x, s) > > Then sum(iabs(A)) is ways faster than sum(abs(A)) (but still slightly > slower than sumabs(A)). > >