I was surprised to still see a bit of this in practice on an 8x Core2 system with the example from our paper:
f := (1 + x + y + 2*z^2 + 3*t^3 + 5*u^5)^12: g := (1 + u + t + 2*z^2 + 3*y^3 + 5*x^5)^12: What happens here is that we construct the result one term at a time, and doing that requires a lot of bandwidth. We generally assume the cores share a cache, i.e. they can transfer data to each other at least as fast than all of them (in aggregate) can transfer data to memory. This is true of the Core i3/i5/i7 and AMD's cpus. On the 8 x Core2 you have 2 cores sharing an L2, then two of those fused together on a multi-chip-module, and two MCMs connected on the motherboard. So there are a lot of different level of interconnect and the last one is especially slow. In this case, an algorithm that works on a shared data structure in memory will scale better, and this is also true of large multi-cpu systems like what TRIP was tested on. So that issue still exists in Maple 15, however it is generally much better because memory is recycled. The larger example on the TRIP page (power=16) does not slowdown on our 8 x Core2 machine and performance on Nehalem machines (2x4 cores) is very good. Performance on the 8-core Nehalem EX makes me want to buy one. Anyways, you can see we have really bet on manycore versus SMP. Mainly because I can't see how to do more complicated algorithms (like division or powering) in SMP without using an order of magnitude more memory. It's a one-size fits all algorithms and most cpus approach. As for Giac, thanks for the update. I look forward to timing it for our next paper :) -- To post to this group, send an email to sage-devel@googlegroups.com To unsubscribe from this group, send an email to sage-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sage-devel URL: http://www.sagemath.org