Many thanks, Igor -- that certainly gives us a lot to chew over, and you have saved us a lot of investigation time; I have a much better sense of the parameter space. I hadn't appreciated that the "form factor" of the problem would have such a big effect.
You mention "8 cores": is that 4 dual-core CPUs? Do you use "mpirun -np 8 ..." when you run your code? Again, thanks Igor (I owe you a beer). +jtg+
