I've put my paper on parallelization on the wiki. In short - it does work. Going from 28 seconds for assembly with one CPU to 4 seconds with 16 CPUs. But, from what we learned it seems like we should look into the graph/mesh partitioning part a bit more.
For example, partitioning a UnitCube(100, 100, 100) into 16 partitions uses about 8 GB ram and takes 7.5 minutes. In the rush before deadline I forgot the acknowledgements section, but Anders really deserves it. Thanks! - Magnus _______________________________________________ DOLFIN-dev mailing list [email protected] http://www.fenics.org/mailman/listinfo/dolfin-dev
