Sorry, I don't this has anything to do with vector ops. Concatenation always triggers memory allocation for new arrays, right? This could explain the slowdown. In fact if I just test the performance of the foreach-initialized buffer over 50 runs, I get results like:
buffer done in usecs: 608. buffer done in usecs: 604. buffer done in usecs: 607. buffer done in usecs: 604. buffer done in usecs: 608. buffer done in usecs: 614. buffer done in usecs: 613. buffer done in usecs: 601. buffer done in usecs: 596. buffer done in usecs: 606. buffer done in usecs: 607. buffer done in usecs: 24353. <- boom So this probably has something to do with memory and paging. :)