main] Rev 4896: Add simple Stokes solver for parallel testing.

Anders Logg Mon, 09 Aug 2010 07:25:06 -0700

On Mon, Aug 09, 2010 at 03:21:59PM +0100, Garth N. Wells wrote:
> On Mon, 2010-08-09 at 16:14 +0200, Anders Logg wrote:
> > On Mon, Aug 09, 2010 at 02:34:45PM +0100, Garth N. Wells wrote:
> > > On Mon, 2010-08-09 at 15:20 +0200, Anders Logg wrote:
> > > > On Mon, Aug 09, 2010 at 02:12:49PM +0100, Garth N. Wells wrote:
> > > > > On Mon, 2010-08-09 at 15:05 +0200, Anders Logg wrote:
> > > > > > On Mon, Aug 09, 2010 at 01:54:04PM +0100, Garth N. Wells wrote:
> > > > > > > On Mon, 2010-08-09 at 14:46 +0200, Anders Logg wrote:
> > > > > > > > On Mon, Aug 09, 2010 at 01:09:47PM +0100, Garth N. Wells wrote:
> > > > > > > > > On Mon, 2010-08-09 at 13:53 +0200, Anders Logg wrote:
> > > > > > > > > > On Mon, Aug 09, 2010 at 12:47:10PM +0100, Garth N. Wells 
> > > > > > > > > > wrote:
> > > > > > > > > > > On Mon, 2010-08-09 at 13:37 +0200, Anders Logg wrote:
> > > > > > > > > > > > On Sat, Aug 07, 2010 at 01:24:44PM +0100, Garth N. 
> > > > > > > > > > > > Wells wrote:
> > > > > > > > > > > > > On Fri, 2010-08-06 at 19:55 +0100, Garth N. Wells 
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > On Fri, 2010-08-06 at 20:53 +0200, Anders Logg 
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > On Fri, Aug 06, 2010 at 07:51:18PM +0100, Garth 
> > > > > > > > > > > > > > > N. Wells wrote:
> > > > > > > > > > > > > > > > On Fri, 2010-08-06 at 20:36 +0200, Anders Logg 
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > On Fri, Aug 06, 2010 at 04:55:44PM +0100, 
> > > > > > > > > > > > > > > > > Garth N. Wells wrote:
> > > > > > > > > > > > > > > > > > On Fri, 2010-08-06 at 08:42 -0700, Johan 
> > > > > > > > > > > > > > > > > > Hake wrote:
> > > > > > > > > > > > > > > > > > > On Friday August 6 2010 08:16:26 you 
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------
> > > > > > > > > > > > > > > > > > > > revno: 4896
> > > > > > > > > > > > > > > > > > > > committer: Garth N. Wells 
> > > > > > > > > > > > > > > > > > > > <gn...@cam.ac.uk>
> > > > > > > > > > > > > > > > > > > > branch nick: dolfin-all
> > > > > > > > > > > > > > > > > > > > timestamp: Fri 2010-08-06 16:13:29 +0100
> > > > > > > > > > > > > > > > > > > > message:
> > > > > > > > > > > > > > > > > > > >   Add simple Stokes solver for parallel 
> > > > > > > > > > > > > > > > > > > > testing.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >   Other Stokes demos don't run in 
> > > > > > > > > > > > > > > > > > > > parallel because MeshFunction io is not
> > > > > > > > > > > > > > > > > > > >   supported in parallel.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Does anyone have an overview of what is 
> > > > > > > > > > > > > > > > > > > needed for this to be fixed. I
> > > > > > > > > > > > > > > > > > > couldn't find a blueprint on it.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Here it is:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >     
> > > > > > > > > > > > > > > > > > https://blueprints.launchpad.net/dolfin/+spec/parallel-io
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I am interested in getting this fixed :)
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Me too! We need to look at all the io since 
> > > > > > > > > > > > > > > > > > much of it is broken in
> > > > > > > > > > > > > > > > > > parallel.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > We need to settle on how to handle XML 
> > > > > > > > > > > > > > > > > > data. I favour (and I know Niclas
> > > > > > > > > > > > > > > > > > Janson does too) the VTK approach in which 
> > > > > > > > > > > > > > > > > > we have a 'master file' that
> > > > > > > > > > > > > > > > > > points to other XML files which contain 
> > > > > > > > > > > > > > > > > > portions of the vector/mesh,
> > > > > > > > > > > > > > > > > > etc. Process zero can read the 'master 
> > > > > > > > > > > > > > > > > > file' and then instruct the other
> > > > > > > > > > > > > > > > > > processes on which file(s) they should read 
> > > > > > > > > > > > > > > > > > in.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > This only works if the data is already 
> > > > > > > > > > > > > > > > > partitioned. Most of our demos
> > > > > > > > > > > > > > > > > assume that we have the mesh in one single 
> > > > > > > > > > > > > > > > > file which is then
> > > > > > > > > > > > > > > > > partitioned on the fly.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The approach does work for data which is not 
> > > > > > > > > > > > > > > > partitioned. Just like with
> > > > > > > > > > > > > > > > VTK, one can read the 'master file' or the 
> > > > > > > > > > > > > > > > individual files.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The initial plan was to support two different 
> > > > > > > > > > > > > > > > > ways of reading data in parallel:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 1. One file and automatic partitioning
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > DOLFIN gets one file "mesh.xml", each process 
> > > > > > > > > > > > > > > > > reads one part of it (just
> > > > > > > > > > > > > > > > > skipping other parts of the file), then the 
> > > > > > > > > > > > > > > > > mesh is partitioned and
> > > > > > > > > > > > > > > > > redistributed.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 2. Several files and no partitioning
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > DOLFIN get multiple files and each process 
> > > > > > > > > > > > > > > > > reads one part. In this
> > > > > > > > > > > > > > > > > case, the mesh and all associated data is 
> > > > > > > > > > > > > > > > > already partitioned. This
> > > > > > > > > > > > > > > > > should be very easy to fix since everything 
> > > > > > > > > > > > > > > > > that is needed is already
> > > > > > > > > > > > > > > > > in place; we just need to fix the logic. In 
> > > > > > > > > > > > > > > > > particular, the data
> > > > > > > > > > > > > > > > > section of each local mesh contains all 
> > > > > > > > > > > > > > > > > auxilliary parallel data.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > This can be handled in two different ways. 
> > > > > > > > > > > > > > > > > Either a user specifies the
> > > > > > > > > > > > > > > > > name of the file as "mesh*.xml", in which 
> > > > > > > > > > > > > > > > > case DOLFIN appends say
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >   "_%d" % MPI::process_number()
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > on each local process.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The other way is to have a master file which 
> > > > > > > > > > > > > > > > > lists all the other
> > > > > > > > > > > > > > > > > files. In this case, I don't see a need for 
> > > > > > > > > > > > > > > > > process 0 to take any kind
> > > > > > > > > > > > > > > > > of responsibility for communicating file 
> > > > > > > > > > > > > > > > > names. It would work fine for
> > > > > > > > > > > > > > > > > each process to read the master file and then 
> > > > > > > > > > > > > > > > > check which file it
> > > > > > > > > > > > > > > > > should use. Each process could also check 
> > > > > > > > > > > > > > > > > that the total number of
> > > > > > > > > > > > > > > > > processes matches the number of partitions in 
> > > > > > > > > > > > > > > > > the file. We could let
> > > > > > > > > > > > > > > > > process 0 handle the parsing of the master 
> > > > > > > > > > > > > > > > > file and then communicate
> > > > > > > > > > > > > > > > > the file names but maybe that is an extra 
> > > > > > > > > > > > > > > > > complication.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > This fails when the number of files differs 
> > > > > > > > > > > > > > > > from the number of
> > > > > > > > > > > > > > > > processes. It's very important to support m 
> > > > > > > > > > > > > > > > files on n processes. We've
> > > > > > > > > > > > > > > > discussed this at length before.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I don't remember. Can you remind me of what the 
> > > > > > > > > > > > > > > reasons are?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I perform a simulation using m processes, and write 
> > > > > > > > > > > > > > the result to m
> > > > > > > > > > > > > > files. Later I want to use the result later in 
> > > > > > > > > > > > > > another computation using
> > > > > > > > > > > > > > n processors.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I've looked a little into parallel io, and looked at 
> > > > > > > > > > > > > what Trilinos and
> > > > > > > > > > > > > PETSc do. Both support HDF5, and HDF5 has been 
> > > > > > > > > > > > > developed to work
> > > > > > > > > > > > > parallel. HDF5 does not advocate the one-file per 
> > > > > > > > > > > > > processes (too awkward
> > > > > > > > > > > > > and complicated they say), but advocates a one file 
> > > > > > > > > > > > > approach. It has
> > > > > > > > > > > > > tools that allow different processes to write to 
> > > > > > > > > > > > > different parts of the
> > > > > > > > > > > > > same file in parallel.
> > > > > > > > > > > > >
> > > > > > > > > > > > > >From reading this, what I propose (for now) is:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1. We only ever write one XML file for a given 
> > > > > > > > > > > > > object. This file can be
> > > > > > > > > > > > > read by different processes, with each reading in 
> > > > > > > > > > > > > only a chunk.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2. We should add an XML format for partitioning data 
> > > > > > > > > > > > > (Trilinos calls
> > > > > > > > > > > > > this a 'map'). If a map file is present, it is used 
> > > > > > > > > > > > > to define the
> > > > > > > > > > > > > partitions. It may make sense to have a map file for 
> > > > > > > > > > > > > each process (but
> > > > > > > > > > > > > no need for a 'master file').
> > > > > > > > > > > >
> > > > > > > > > > > > I suggest something slightly different. I'm ok with the 
> > > > > > > > > > > > one file
> > > > > > > > > > > > approach, but it would be good to store that data in a 
> > > > > > > > > > > > partitioned
> > > > > > > > > > > > way. Our current model for parallel computing is that 
> > > > > > > > > > > > each process has
> > > > > > > > > > > > a Mesh and each process has the partitioning data it 
> > > > > > > > > > > > needs stored in
> > > > > > > > > > > > the data section of the Mesh. So each process has just 
> > > > > > > > > > > > a regular mesh
> > > > > > > > > > > > with some auxilliary data attached to it. That makes it 
> > > > > > > > > > > > easy to read
> > > > > > > > > > > > and write using already existing code. (No need for a 
> > > > > > > > > > > > special parallel
> > > > > > > > > > > > format.)
> > > > > > > > > > > >
> > > > > > > > > > > > But we could easily throw all that data into one big 
> > > > > > > > > > > > file, something
> > > > > > > > > > > > like this:
> > > > > > > > > > > >
> > > > > > > > > > > > <distributed_mesh num_parts="16">
> > > > > > > > > > > >   <mesh ...>
> > > > > > > > > > > >     ...
> > > > > > > > > > > >   </mesh>
> > > > > > > > > > > >   <mesh ...>
> > > > > > > > > > > >     ...
> > > > > > > > > > > >   </mesh>
> > > > > > > > > > > >   ...
> > > > > > > > > > > > </distributed_mesh>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I would like to separate mesh and partitioning data. A 
> > > > > > > > > > > partitioning of a
> > > > > > > > > > > given mesh is not unique to a mesh, so it should be 
> > > > > > > > > > > separated. A
> > > > > > > > > > > partition could still go in the same XML file though.
> > > > > > > > > >
> > > > > > > > > > It is separate from the mesh, but it is stored locally as 
> > > > > > > > > > part of the
> > > > > > > > > > Mesh in MeshData. I think this has proved to be a very good 
> > > > > > > > > > way
> > > > > > > > > > (efficient and simple) to store the data.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > This breaks the concept of using a given Mesh on a different 
> > > > > > > > > number of
> > > > > > > > > processes. For a given mesh, I may want to store and use 
> > > > > > > > > different
> > > > > > > > > partitions.
> > > > > > > > >
> > > > > > > > > The same applied to Vectors.
> > > > > > > > >
> > > > > > > > > > Or do you suggest that we store it differently on file than 
> > > > > > > > > > we do as
> > > > > > > > > > part of the DOLFIN data structures?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I suggest that we decouple the Mesh and the partition data 
> > > > > > > > > for file
> > > > > > > > > output. It comes back to permitting a variable number of 
> > > > > > > > > processes.
> > > > > > > >
> > > > > > > > I thought that the storage output from m processes would be the 
> > > > > > > > mesh
> > > > > > > > partitioned into m pieces with auxilliary partitioning
> > > > > > > > information. This would then be read by n processes and 
> > > > > > > > repartitioned.
> > > > > > > >
> > > > > > > > If you don't want the partitioning into a specific number of
> > > > > > > > partitions (like m partitions above), then what extra 
> > > > > > > > partitioning
> > > > > > > > data would be required? It would just be a regular mesh file.
> > > > > > > >
> > > > > > >
> > > > > > > It would map entities to processes. It could also be used for 
> > > > > > > Vectors.
> > > > > >
> > > > > > Then how is that not coupled to a specific number of processes?
> > > > > >
> > > > > > And why would it be better than what we have today, which stores all
> > > > > > the parallel data we need (not just that mapping but everything else
> > > > > > that can and must be generated from it) as part of each mesh?
> > > > > >
> > > > > > > > Partitioning data is always coupled to a specific number of 
> > > > > > > > processes
> > > > > > > > as far as I can imagine.
> > > > > > > >
> > > > > > >
> > > > > > > Naturally, and so it should be. But the mesh output doesn't have 
> > > > > > > to be,
> > > > > > > which why I suggest having a Mesh and partitioning data, e.g.
> > > > > > >
> > > > > > >   // Let DOLFIN partition and distribute the mesh
> > > > > > >   Mesh meshA("mesh.xml")
> > > > > >
> > > > > > Is this just one regular mesh file?
> > > > > >
> > > > >
> > > > > Yes, and ParMETIS would partition the mesh.
> > > > >
> > > > > > >   // Partition the mesh according to my_partition.xml mesh. Throw 
> > > > > > > an
> > > > > > >   // error if there is a mis-match between my_partition and the 
> > > > > > > number
> > > > > > >   // of processes
> > > > > > >   Mesh meshB("mesh.xml", "my_partition.xml")
> > > > > >
> > > > > > I don't understand this use-case. The first line above would already
> > > > > > partition the mesh (using ParMETIS), and then you want to 
> > > > > > repartition
> > > > > > it. Is it the case that my_partition.xml contains a better
> > > > > > partitioning than what ParMETIS can compute?
> > > > > >
> > > > >
> > > > > In the case "meshB("mesh.xml", "my_partition.xml")", ParMETIS would
> > > > > never be called. The mesh would be distributed according to
> > > > > "my_partition.xml".
> > > >
> > > > I missed that you created two different meshes. ok, that sounds like a
> > > > good thing to have.
> > > >
> > > > But how would the mesh.xml be stored on file? Just one big file?
> > > >
> > >
> > > Yes, just as we have now.
> >
> > ok.
> >
> > > > And how is the above example related to the storing of *partitioned*
> > > > meshes?
> > > >
> > >
> > > >From reading around, I don't think that we should store partitioned
> > > meshes in the sense that each partition is a separate file. We should
> > > store an entire mesh in one file, and store the partitioning data if
> > > desired.
> >
> > I don't understand why not. It's a good feature to have to be able to
> > restart a simulation (on the same number of processors) and completely
> > avoid the partitioning step. Moreover, everything we need is already
> > implemented since we can read/write meshes including mesh data.
> >
>
> It is a good feature. To do it, save the mesh and the partition, and
> then restart using
>
>   Mesh mesh("mesh.xml", "partition.xml");
>
> This will not involve any partitioning.


It's true that it wouldn't involve any partitioning (as in calling
ParMETIS) but it would involve redistributing the mesh between all
processes and computing all the auxilliary data that is needed, like
the global vertex numbering and mappings to neighboring processes.

By storing one file per process (which is almost already supported),
that would also be avoided. Each process would read a regular mesh
file and be ready to go. No communication and no computation involved.

> > In summary, if I understand correctly that you want things stored in
> > one file, we would have the following:
> >
> > 1. Reading from one file and automatic partitioning/redistribution
> >
> > This is already supported today, and it already supports the use case
> > of first running on m processors and then n processors since in both
> > cases the mesh will be read from one single file and partitioned.
> >
>
> Yes.
>
> > 2. Reading from multiple files where the mesh has already been
> > partitioned and m = n. This is almost supported today. We just need to
> > decide on the logic for reading from multiple files.
> >
>
> I advocate not supporting multiple files. Keeping to one file-file model
> makes things simple.

Multiple files would not be much of a complication since it is almost
already in place. Each process just reads its corresponding file
without any communication.

--
Anders

signature.asc
Description: Digital signature

_______________________________________________
Mailing list: https://launchpad.net/~dolfin
Post to     : dolfin@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dolfin
More help   : https://help.launchpad.net/ListHelp

Re: [Dolfin] [Branch ~dolfin-core/dolfin/main] Rev 4896: Add simple Stokes solver for parallel testing.

Reply via email to