Hello, Actually, I also spent some time recently trying to understand what is the gain from parallel execution. To my regret, I also didn't find a bursting decrease in execution time. So, some info about what I've noticed from Grid2D runs.
1) no gain, only loss with a 2-core-cpu in 2D case 1*) loss increases with increasing cell number, meaning messaging overhead is too much to get some gain on 2-core machine (efficiency of each core is always less than 50%), so I started testing on 8-core-cpu. 2) there is always split along 1 axis only, so if i freeze number of cells in one axis (to avoid increase in messaging) and start increasing the number of cells in second axis: the efficiency of each core starts increasing. 3) the initial number of cells should also be high enough so that messaging burden was less than actual gain due to additional cores. 4) i managed to get 65% efficiency of each core, meaning i get 5.2 times performance increase with 8 cores. I think that everything will depend strongly on one's problem, so numbers might be different for other problem, but for some hints for 8 cores: a) I started to get noticable execution decrease time (core efficiency ~50%) after I had mesh smth like 120x280 (split is done along smaller cell number here, so 120x35 cells per core). b) if i had increased to 240x280, the gain would go down a bit, so I stayed at 120 in axis along which the split is done (so staying with less communication between cores) c) the maximum efficiency of 65% per core I got with 120x1120 mesh, with subsequent mesh increase the efficiency didn't change. NOW, bad news. I found that in parallel and serial runs I get different results! Some numerical artifacts develop on the last core(s) where the cutoff of the solution (of diffusion-convection problem with many terms) is situated. So be aware to check your results with the serial run of the same mesh size as in parallel one! Although the artifacts are at rather low level compared to the solution they are unpleasant and make me being a bit inconfident about parallel runs given that in serial mode there are no artifacts at all. I haven't tried yet to swap the axes so that the split was done along other variable. Probably the artifacts should dissapear then as there will be entire solution of the problem (in my case) on each core (at any given time) and other cores will not to have *wait* and *develop* numerical artifacts until the real solution reaches them. But this is just my consideration and I don't have time yet to try it. I also haven't yet tried to play with different solvers and <*>ConvectionTerm types. Probably this would help as well. Regards, Igor.
