On Mon, 2010-08-09 at 17:39 +0200, Anders Logg wrote: > On Mon, Aug 09, 2010 at 04:04:24PM +0100, Garth N. Wells wrote: > > On Mon, 2010-08-09 at 16:54 +0200, Anders Logg wrote: > > > On Mon, Aug 09, 2010 at 03:47:17PM +0100, Garth N. Wells wrote: > > > > On Mon, 2010-08-09 at 16:41 +0200, Anders Logg wrote: > > > > > On Mon, Aug 09, 2010 at 03:30:03PM +0100, Garth N. Wells wrote: > > > > > > On Mon, 2010-08-09 at 16:24 +0200, Anders Logg wrote: > > > > > > > On Mon, Aug 09, 2010 at 03:21:59PM +0100, Garth N. Wells wrote: > > > > > > > > On Mon, 2010-08-09 at 16:14 +0200, Anders Logg wrote: > > > > > > > > > On Mon, Aug 09, 2010 at 02:34:45PM +0100, Garth N. Wells > > > > > > > > > wrote: > > > > > > > > > > On Mon, 2010-08-09 at 15:20 +0200, Anders Logg wrote: > > > > > > > > > > > On Mon, Aug 09, 2010 at 02:12:49PM +0100, Garth N. Wells > > > > > > > > > > > wrote: > > > > > > > > > > > > On Mon, 2010-08-09 at 15:05 +0200, Anders Logg wrote: > > > > > > > > > > > > > On Mon, Aug 09, 2010 at 01:54:04PM +0100, Garth N. > > > > > > > > > > > > > Wells wrote: > > > > > > > > > > > > > > On Mon, 2010-08-09 at 14:46 +0200, Anders Logg > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > On Mon, Aug 09, 2010 at 01:09:47PM +0100, Garth > > > > > > > > > > > > > > > N. Wells wrote: > > > > > > > > > > > > > > > > On Mon, 2010-08-09 at 13:53 +0200, Anders Logg > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > On Mon, Aug 09, 2010 at 12:47:10PM +0100, > > > > > > > > > > > > > > > > > Garth N. Wells wrote: > > > > > > > > > > > > > > > > > > On Mon, 2010-08-09 at 13:37 +0200, Anders > > > > > > > > > > > > > > > > > > Logg wrote: > > > > > > > > > > > > > > > > > > > On Sat, Aug 07, 2010 at 01:24:44PM +0100, > > > > > > > > > > > > > > > > > > > Garth N. Wells wrote: > > > > > > > > > > > > > > > > > > > > On Fri, 2010-08-06 at 19:55 +0100, > > > > > > > > > > > > > > > > > > > > Garth N. Wells wrote: > > > > > > > > > > > > > > > > > > > > > On Fri, 2010-08-06 at 20:53 +0200, > > > > > > > > > > > > > > > > > > > > > Anders Logg wrote: > > > > > > > > > > > > > > > > > > > > > > On Fri, Aug 06, 2010 at 07:51:18PM > > > > > > > > > > > > > > > > > > > > > > +0100, Garth N. Wells wrote: > > > > > > > > > > > > > > > > > > > > > > > On Fri, 2010-08-06 at 20:36 > > > > > > > > > > > > > > > > > > > > > > > +0200, Anders Logg wrote: > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Aug 06, 2010 at > > > > > > > > > > > > > > > > > > > > > > > > 04:55:44PM +0100, Garth N. > > > > > > > > > > > > > > > > > > > > > > > > Wells wrote: > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, 2010-08-06 at 08:42 > > > > > > > > > > > > > > > > > > > > > > > > > -0700, Johan Hake wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > On Friday August 6 2010 > > > > > > > > > > > > > > > > > > > > > > > > > > 08:16:26 you wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------ > > > > > > > > > > > > > > > > > > > > > > > > > > > revno: 4896 > > > > > > > > > > > > > > > > > > > > > > > > > > > committer: Garth N. Wells > > > > > > > > > > > > > > > > > > > > > > > > > > > <gn...@cam.ac.uk> > > > > > > > > > > > > > > > > > > > > > > > > > > > branch nick: dolfin-all > > > > > > > > > > > > > > > > > > > > > > > > > > > timestamp: Fri 2010-08-06 > > > > > > > > > > > > > > > > > > > > > > > > > > > 16:13:29 +0100 > > > > > > > > > > > > > > > > > > > > > > > > > > > message: > > > > > > > > > > > > > > > > > > > > > > > > > > > Add simple Stokes > > > > > > > > > > > > > > > > > > > > > > > > > > > solver for parallel > > > > > > > > > > > > > > > > > > > > > > > > > > > testing. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Other Stokes demos > > > > > > > > > > > > > > > > > > > > > > > > > > > don't run in parallel > > > > > > > > > > > > > > > > > > > > > > > > > > > because MeshFunction io > > > > > > > > > > > > > > > > > > > > > > > > > > > is not > > > > > > > > > > > > > > > > > > > > > > > > > > > supported in parallel. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Does anyone have an > > > > > > > > > > > > > > > > > > > > > > > > > > overview of what is needed > > > > > > > > > > > > > > > > > > > > > > > > > > for this to be fixed. I > > > > > > > > > > > > > > > > > > > > > > > > > > couldn't find a blueprint > > > > > > > > > > > > > > > > > > > > > > > > > > on it. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Here it is: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://blueprints.launchpad.net/dolfin/+spec/parallel-io > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I am interested in getting > > > > > > > > > > > > > > > > > > > > > > > > > > this fixed :) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Me too! We need to look at > > > > > > > > > > > > > > > > > > > > > > > > > all the io since much of it > > > > > > > > > > > > > > > > > > > > > > > > > is broken in > > > > > > > > > > > > > > > > > > > > > > > > > parallel. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We need to settle on how to > > > > > > > > > > > > > > > > > > > > > > > > > handle XML data. I favour > > > > > > > > > > > > > > > > > > > > > > > > > (and I know Niclas > > > > > > > > > > > > > > > > > > > > > > > > > Janson does too) the VTK > > > > > > > > > > > > > > > > > > > > > > > > > approach in which we have a > > > > > > > > > > > > > > > > > > > > > > > > > 'master file' that > > > > > > > > > > > > > > > > > > > > > > > > > points to other XML files > > > > > > > > > > > > > > > > > > > > > > > > > which contain portions of the > > > > > > > > > > > > > > > > > > > > > > > > > vector/mesh, > > > > > > > > > > > > > > > > > > > > > > > > > etc. Process zero can read > > > > > > > > > > > > > > > > > > > > > > > > > the 'master file' and then > > > > > > > > > > > > > > > > > > > > > > > > > instruct the other > > > > > > > > > > > > > > > > > > > > > > > > > processes on which file(s) > > > > > > > > > > > > > > > > > > > > > > > > > they should read in. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This only works if the data is > > > > > > > > > > > > > > > > > > > > > > > > already partitioned. Most of > > > > > > > > > > > > > > > > > > > > > > > > our demos > > > > > > > > > > > > > > > > > > > > > > > > assume that we have the mesh in > > > > > > > > > > > > > > > > > > > > > > > > one single file which is then > > > > > > > > > > > > > > > > > > > > > > > > partitioned on the fly. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The approach does work for data > > > > > > > > > > > > > > > > > > > > > > > which is not partitioned. Just > > > > > > > > > > > > > > > > > > > > > > > like with > > > > > > > > > > > > > > > > > > > > > > > VTK, one can read the 'master > > > > > > > > > > > > > > > > > > > > > > > file' or the individual files. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The initial plan was to support > > > > > > > > > > > > > > > > > > > > > > > > two different ways of reading > > > > > > > > > > > > > > > > > > > > > > > > data in parallel: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. One file and automatic > > > > > > > > > > > > > > > > > > > > > > > > partitioning > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > DOLFIN gets one file > > > > > > > > > > > > > > > > > > > > > > > > "mesh.xml", each process reads > > > > > > > > > > > > > > > > > > > > > > > > one part of it (just > > > > > > > > > > > > > > > > > > > > > > > > skipping other parts of the > > > > > > > > > > > > > > > > > > > > > > > > file), then the mesh is > > > > > > > > > > > > > > > > > > > > > > > > partitioned and > > > > > > > > > > > > > > > > > > > > > > > > redistributed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. Several files and no > > > > > > > > > > > > > > > > > > > > > > > > partitioning > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > DOLFIN get multiple files and > > > > > > > > > > > > > > > > > > > > > > > > each process reads one part. In > > > > > > > > > > > > > > > > > > > > > > > > this > > > > > > > > > > > > > > > > > > > > > > > > case, the mesh and all > > > > > > > > > > > > > > > > > > > > > > > > associated data is already > > > > > > > > > > > > > > > > > > > > > > > > partitioned. This > > > > > > > > > > > > > > > > > > > > > > > > should be very easy to fix > > > > > > > > > > > > > > > > > > > > > > > > since everything that is needed > > > > > > > > > > > > > > > > > > > > > > > > is already > > > > > > > > > > > > > > > > > > > > > > > > in place; we just need to fix > > > > > > > > > > > > > > > > > > > > > > > > the logic. In particular, the > > > > > > > > > > > > > > > > > > > > > > > > data > > > > > > > > > > > > > > > > > > > > > > > > section of each local mesh > > > > > > > > > > > > > > > > > > > > > > > > contains all auxilliary > > > > > > > > > > > > > > > > > > > > > > > > parallel data. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This can be handled in two > > > > > > > > > > > > > > > > > > > > > > > > different ways. Either a user > > > > > > > > > > > > > > > > > > > > > > > > specifies the > > > > > > > > > > > > > > > > > > > > > > > > name of the file as > > > > > > > > > > > > > > > > > > > > > > > > "mesh*.xml", in which case > > > > > > > > > > > > > > > > > > > > > > > > DOLFIN appends say > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > "_%d" % MPI::process_number() > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > on each local process. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The other way is to have a > > > > > > > > > > > > > > > > > > > > > > > > master file which lists all the > > > > > > > > > > > > > > > > > > > > > > > > other > > > > > > > > > > > > > > > > > > > > > > > > files. In this case, I don't > > > > > > > > > > > > > > > > > > > > > > > > see a need for process 0 to > > > > > > > > > > > > > > > > > > > > > > > > take any kind > > > > > > > > > > > > > > > > > > > > > > > > of responsibility for > > > > > > > > > > > > > > > > > > > > > > > > communicating file names. It > > > > > > > > > > > > > > > > > > > > > > > > would work fine for > > > > > > > > > > > > > > > > > > > > > > > > each process to read the master > > > > > > > > > > > > > > > > > > > > > > > > file and then check which file > > > > > > > > > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > > > > > > > > should use. Each process could > > > > > > > > > > > > > > > > > > > > > > > > also check that the total > > > > > > > > > > > > > > > > > > > > > > > > number of > > > > > > > > > > > > > > > > > > > > > > > > processes matches the number of > > > > > > > > > > > > > > > > > > > > > > > > partitions in the file. We > > > > > > > > > > > > > > > > > > > > > > > > could let > > > > > > > > > > > > > > > > > > > > > > > > process 0 handle the parsing of > > > > > > > > > > > > > > > > > > > > > > > > the master file and then > > > > > > > > > > > > > > > > > > > > > > > > communicate > > > > > > > > > > > > > > > > > > > > > > > > the file names but maybe that > > > > > > > > > > > > > > > > > > > > > > > > is an extra complication. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This fails when the number of > > > > > > > > > > > > > > > > > > > > > > > files differs from the number of > > > > > > > > > > > > > > > > > > > > > > > processes. It's very important to > > > > > > > > > > > > > > > > > > > > > > > support m files on n processes. > > > > > > > > > > > > > > > > > > > > > > > We've > > > > > > > > > > > > > > > > > > > > > > > discussed this at length before. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I don't remember. Can you remind me > > > > > > > > > > > > > > > > > > > > > > of what the reasons are? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I perform a simulation using m > > > > > > > > > > > > > > > > > > > > > processes, and write the result to m > > > > > > > > > > > > > > > > > > > > > files. Later I want to use the result > > > > > > > > > > > > > > > > > > > > > later in another computation using > > > > > > > > > > > > > > > > > > > > > n processors. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I've looked a little into parallel io, > > > > > > > > > > > > > > > > > > > > and looked at what Trilinos and > > > > > > > > > > > > > > > > > > > > PETSc do. Both support HDF5, and HDF5 > > > > > > > > > > > > > > > > > > > > has been developed to work > > > > > > > > > > > > > > > > > > > > parallel. HDF5 does not advocate the > > > > > > > > > > > > > > > > > > > > one-file per processes (too awkward > > > > > > > > > > > > > > > > > > > > and complicated they say), but > > > > > > > > > > > > > > > > > > > > advocates a one file approach. It has > > > > > > > > > > > > > > > > > > > > tools that allow different processes to > > > > > > > > > > > > > > > > > > > > write to different parts of the > > > > > > > > > > > > > > > > > > > > same file in parallel. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >From reading this, what I propose (for > > > > > > > > > > > > > > > > > > > > >now) is: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. We only ever write one XML file for > > > > > > > > > > > > > > > > > > > > a given object. This file can be > > > > > > > > > > > > > > > > > > > > read by different processes, with each > > > > > > > > > > > > > > > > > > > > reading in only a chunk. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. We should add an XML format for > > > > > > > > > > > > > > > > > > > > partitioning data (Trilinos calls > > > > > > > > > > > > > > > > > > > > this a 'map'). If a map file is > > > > > > > > > > > > > > > > > > > > present, it is used to define the > > > > > > > > > > > > > > > > > > > > partitions. It may make sense to have a > > > > > > > > > > > > > > > > > > > > map file for each process (but > > > > > > > > > > > > > > > > > > > > no need for a 'master file'). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I suggest something slightly different. > > > > > > > > > > > > > > > > > > > I'm ok with the one file > > > > > > > > > > > > > > > > > > > approach, but it would be good to store > > > > > > > > > > > > > > > > > > > that data in a partitioned > > > > > > > > > > > > > > > > > > > way. Our current model for parallel > > > > > > > > > > > > > > > > > > > computing is that each process has > > > > > > > > > > > > > > > > > > > a Mesh and each process has the > > > > > > > > > > > > > > > > > > > partitioning data it needs stored in > > > > > > > > > > > > > > > > > > > the data section of the Mesh. So each > > > > > > > > > > > > > > > > > > > process has just a regular mesh > > > > > > > > > > > > > > > > > > > with some auxilliary data attached to it. > > > > > > > > > > > > > > > > > > > That makes it easy to read > > > > > > > > > > > > > > > > > > > and write using already existing code. > > > > > > > > > > > > > > > > > > > (No need for a special parallel > > > > > > > > > > > > > > > > > > > format.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > But we could easily throw all that data > > > > > > > > > > > > > > > > > > > into one big file, something > > > > > > > > > > > > > > > > > > > like this: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > <distributed_mesh num_parts="16"> > > > > > > > > > > > > > > > > > > > <mesh ...> > > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > </mesh> > > > > > > > > > > > > > > > > > > > <mesh ...> > > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > </mesh> > > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > </distributed_mesh> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I would like to separate mesh and > > > > > > > > > > > > > > > > > > partitioning data. A partitioning of a > > > > > > > > > > > > > > > > > > given mesh is not unique to a mesh, so it > > > > > > > > > > > > > > > > > > should be separated. A > > > > > > > > > > > > > > > > > > partition could still go in the same XML > > > > > > > > > > > > > > > > > > file though. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It is separate from the mesh, but it is > > > > > > > > > > > > > > > > > stored locally as part of the > > > > > > > > > > > > > > > > > Mesh in MeshData. I think this has proved to > > > > > > > > > > > > > > > > > be a very good way > > > > > > > > > > > > > > > > > (efficient and simple) to store the data. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This breaks the concept of using a given Mesh > > > > > > > > > > > > > > > > on a different number of > > > > > > > > > > > > > > > > processes. For a given mesh, I may want to > > > > > > > > > > > > > > > > store and use different > > > > > > > > > > > > > > > > partitions. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The same applied to Vectors. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Or do you suggest that we store it > > > > > > > > > > > > > > > > > differently on file than we do as > > > > > > > > > > > > > > > > > part of the DOLFIN data structures? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I suggest that we decouple the Mesh and the > > > > > > > > > > > > > > > > partition data for file > > > > > > > > > > > > > > > > output. It comes back to permitting a variable > > > > > > > > > > > > > > > > number of processes. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I thought that the storage output from m > > > > > > > > > > > > > > > processes would be the mesh > > > > > > > > > > > > > > > partitioned into m pieces with auxilliary > > > > > > > > > > > > > > > partitioning > > > > > > > > > > > > > > > information. This would then be read by n > > > > > > > > > > > > > > > processes and repartitioned. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If you don't want the partitioning into a > > > > > > > > > > > > > > > specific number of > > > > > > > > > > > > > > > partitions (like m partitions above), then what > > > > > > > > > > > > > > > extra partitioning > > > > > > > > > > > > > > > data would be required? It would just be a > > > > > > > > > > > > > > > regular mesh file. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It would map entities to processes. It could also > > > > > > > > > > > > > > be used for Vectors. > > > > > > > > > > > > > > > > > > > > > > > > > > Then how is that not coupled to a specific number of > > > > > > > > > > > > > processes? > > > > > > > > > > > > > > > > > > > > > > > > > > And why would it be better than what we have today, > > > > > > > > > > > > > which stores all > > > > > > > > > > > > > the parallel data we need (not just that mapping but > > > > > > > > > > > > > everything else > > > > > > > > > > > > > that can and must be generated from it) as part of > > > > > > > > > > > > > each mesh? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Partitioning data is always coupled to a specific > > > > > > > > > > > > > > > number of processes > > > > > > > > > > > > > > > as far as I can imagine. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Naturally, and so it should be. But the mesh output > > > > > > > > > > > > > > doesn't have to be, > > > > > > > > > > > > > > which why I suggest having a Mesh and partitioning > > > > > > > > > > > > > > data, e.g. > > > > > > > > > > > > > > > > > > > > > > > > > > > > // Let DOLFIN partition and distribute the mesh > > > > > > > > > > > > > > Mesh meshA("mesh.xml") > > > > > > > > > > > > > > > > > > > > > > > > > > Is this just one regular mesh file? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, and ParMETIS would partition the mesh. > > > > > > > > > > > > > > > > > > > > > > > > > > // Partition the mesh according to > > > > > > > > > > > > > > my_partition.xml mesh. Throw an > > > > > > > > > > > > > > // error if there is a mis-match between > > > > > > > > > > > > > > my_partition and the number > > > > > > > > > > > > > > // of processes > > > > > > > > > > > > > > Mesh meshB("mesh.xml", "my_partition.xml") > > > > > > > > > > > > > > > > > > > > > > > > > > I don't understand this use-case. The first line > > > > > > > > > > > > > above would already > > > > > > > > > > > > > partition the mesh (using ParMETIS), and then you > > > > > > > > > > > > > want to repartition > > > > > > > > > > > > > it. Is it the case that my_partition.xml contains a > > > > > > > > > > > > > better > > > > > > > > > > > > > partitioning than what ParMETIS can compute? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In the case "meshB("mesh.xml", "my_partition.xml")", > > > > > > > > > > > > ParMETIS would > > > > > > > > > > > > never be called. The mesh would be distributed > > > > > > > > > > > > according to > > > > > > > > > > > > "my_partition.xml". > > > > > > > > > > > > > > > > > > > > > > I missed that you created two different meshes. ok, that > > > > > > > > > > > sounds like a > > > > > > > > > > > good thing to have. > > > > > > > > > > > > > > > > > > > > > > But how would the mesh.xml be stored on file? Just one > > > > > > > > > > > big file? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, just as we have now. > > > > > > > > > > > > > > > > > > ok. > > > > > > > > > > > > > > > > > > > > And how is the above example related to the storing of > > > > > > > > > > > *partitioned* > > > > > > > > > > > meshes? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >From reading around, I don't think that we should store > > > > > > > > > > >partitioned > > > > > > > > > > meshes in the sense that each partition is a separate file. > > > > > > > > > > We should > > > > > > > > > > store an entire mesh in one file, and store the > > > > > > > > > > partitioning data if > > > > > > > > > > desired. > > > > > > > > > > > > > > > > > > I don't understand why not. It's a good feature to have to be > > > > > > > > > able to > > > > > > > > > restart a simulation (on the same number of processors) and > > > > > > > > > completely > > > > > > > > > avoid the partitioning step. Moreover, everything we need is > > > > > > > > > already > > > > > > > > > implemented since we can read/write meshes including mesh > > > > > > > > > data. > > > > > > > > > > > > > > > > > > > > > > > > > It is a good feature. To do it, save the mesh and the > > > > > > > > partition, and > > > > > > > > then restart using > > > > > > > > > > > > > > > > Mesh mesh("mesh.xml", "partition.xml"); > > > > > > > > > > > > > > > > This will not involve any partitioning. > > > > > > > > > > > > > > It's true that it wouldn't involve any partitioning (as in calling > > > > > > > ParMETIS) but it would involve redistributing the mesh between all > > > > > > > processes and computing all the auxilliary data that is needed, > > > > > > > like > > > > > > > the global vertex numbering and mappings to neighboring processes. > > > > > > > > > > > > No, because the process would only read the bits which it needs. The > > > > > > partition data would contains the other data. > > > > > > > > > > So it would not just be a simple list of markers for which cells > > > > > belong where? It would be all that other stuff that we stored in > > > > > MeshData today? > > > > > > > > > > > > By storing one file per process (which is almost already > > > > > > > supported), > > > > > > > that would also be avoided. Each process would read a regular mesh > > > > > > > file and be ready to go. No communication and no computation > > > > > > > involved. > > > > > > > > > > > > > > > > In summary, if I understand correctly that you want things > > > > > > > > > stored in > > > > > > > > > one file, we would have the following: > > > > > > > > > > > > > > > > > > 1. Reading from one file and automatic > > > > > > > > > partitioning/redistribution > > > > > > > > > > > > > > > > > > This is already supported today, and it already supports the > > > > > > > > > use case > > > > > > > > > of first running on m processors and then n processors since > > > > > > > > > in both > > > > > > > > > cases the mesh will be read from one single file and > > > > > > > > > partitioned. > > > > > > > > > > > > > > > > > > > > > > > > > Yes. > > > > > > > > > > > > > > > > > 2. Reading from multiple files where the mesh has already been > > > > > > > > > partitioned and m = n. This is almost supported today. We > > > > > > > > > just need to > > > > > > > > > decide on the logic for reading from multiple files. > > > > > > > > > > > > > > > > > > > > > > > > > I advocate not supporting multiple files. Keeping to one > > > > > > > > file-file model > > > > > > > > makes things simple. > > > > > > > > > > > > > > Multiple files would not be much of a complication since it is > > > > > > > almost > > > > > > > already in place. Each process just reads its corresponding file > > > > > > > without any communication. > > > > > > > > > > > > > > > > > > > Having dug around in the XML files over the past few days, I > > > > > > wouldn't > > > > > > agree with this. It may seem outwardly simple, but it means files > > > > > > having > > > > > > different states and it gets messy. > > > > > > > > > > What do you mean by different states? Each process just reads a plain > > > > > mesh XML file which is no different from a regular mesh file except > > > > > that the mesh data would contain a few more things. > > > > > > > > > > > > > In one case. But we also support reading/writing from/to one file or > > > > multiple files. We need to decide who should open a file (is object > > > > local or distributed), should DOLFIN or the user add a process suffix, > > > > how should the object be created (which depends on how it's read in). > > > > It's simpler (for the user and the developer) if there is just one way > > > > to do something. > > > > > > How about the following: > > > > > > 1. Input/output is always just one file > > > > > > 2. Always assume parallel when MPI::num_processes() > 1 > > > > > > 3. When reading a regular mesh file, it is always partitioned > > > > > > > This round back-to-front. The majority of meshes read in won't already > > be partitioned. > > > > > 4. The partitioning is either handled by ParMETIS, or alternatively, > > > according to a file or MeshFunction<uint> specified by the user. In > > > particular, the file containing the partition can just be a regular > > > MeshFunction<uint> stored in XML. > > > > > > 5. When specifying the partition as above, the only difference from > > > now is that the call to ParMETIS is bypassed. The rest remains the > > > same. > > > > > > 6. When writing a mesh in parallel, we write something like this: > > > > > > <distributed_mesh> > > > <mesh> > > > > > > </mesh> > > > <mesh> > > > > > > </mesh> > > > ... > > > </distributed_mesh> > > > > > > 7. When reading in parallel, DOLFIN will check if it gets a regular > > > mesh file (in which case it partitions it like in 3) or if it gets a > > > distributed mesh. In that case, it will check that the number of > > > processors matches (m = n) and then just continue to let each process > > > read the data it needs by just skipping to the appropriate location in > > > the file and reading the portion it needs. > > > > > > That way, it would be very clear what happens. Clear to the user (by > > > just inspecting the XML file, is it a mesh or a distributed mesh) and > > > to DOLFIN. > > > > > > > It doesn't seem that I'm making my point very clearly ;). > > Perhaps not... ;-) Keep trying. At some point, I might see the light. > > > A mesh is a mesh. When I open the mesh file I want to be able to just > > read it, no matter how many processes I'm using or how it was previously > > partitioned. This is why I object to partitioning data being stored in > > the mesh. We shouldn't have <distributed_mesh>. > > So you advocate just the same old mesh format we have today? (And > possibly some additional HDF5 format later if/when we need it.) >
Yes. > > If we have a MeshFuction which defines a partition, with this the mesh > > XML reader knows which parts of the mesh file it is responsible for. > > Yes, and I agree that's a good feature. My point is just this, if you > want to rerun a simulation on m = n processors, you have three options: > > 1. Just read from mesh.xml and let ParMETIS recompute the partitioning > and redistribute the mesh. > Yes. > 2. Read from mesh.xml and supply partition.xml (storing a > MeshFunction). DOLFIN can then use this to accomplish two things: > > (a) Each process reads exactly the data it needs (no redistribution > necessary) > (b) No need to call ParMETIS > Yes. > We would still need to recompute all auxilliary parallel data (like > the mapping to neighboring processes). > This shouldn't be hard since the mesh.xml will contain the global numbers. > 3. Read from either one file containing <distribute_mesh> or from > multiple files with plain meshes. This would by-pass all the above > steps and be the most efficient step. > > The points I want to make is that case (2) does not handle anything > that case (3) does not handle; case (3) is the most efficient; and > case (3) is (almost) already implemented so there are no complications > involved in handling it. > > My suggestion for handling case (3) is (as before) the following: > > Mesh mesh("mesh*.xml") > > The * would indicate that we should read multiple files. > > I'll be offline for a few hours so we can continue this lengthy > discussion later tonight... :-) > I come back to the point that a one-file approach is simpler. It's not just me saying this, it's what I've gleaned from both reading around on parallel io and from implementing things myself ;). Garth > -- > Anders _______________________________________________ Mailing list: https://launchpad.net/~dolfin Post to : dolfin@lists.launchpad.net Unsubscribe : https://launchpad.net/~dolfin More help : https://help.launchpad.net/ListHelp