On Mon, Aug 09, 2010 at 02:12:49PM +0100, Garth N. Wells wrote:
> On Mon, 2010-08-09 at 15:05 +0200, Anders Logg wrote:
> > On Mon, Aug 09, 2010 at 01:54:04PM +0100, Garth N. Wells wrote:
> > > On Mon, 2010-08-09 at 14:46 +0200, Anders Logg wrote:
> > > > On Mon, Aug 09, 2010 at 01:09:47PM +0100, Garth N. Wells wrote:
> > > > > On Mon, 2010-08-09 at 13:53 +0200, Anders Logg wrote:
> > > > > > On Mon, Aug 09, 2010 at 12:47:10PM +0100, Garth N. Wells wrote:
> > > > > > > On Mon, 2010-08-09 at 13:37 +0200, Anders Logg wrote:
> > > > > > > > On Sat, Aug 07, 2010 at 01:24:44PM +0100, Garth N. Wells wrote:
> > > > > > > > > On Fri, 2010-08-06 at 19:55 +0100, Garth N. Wells wrote:
> > > > > > > > > > On Fri, 2010-08-06 at 20:53 +0200, Anders Logg wrote:
> > > > > > > > > > > On Fri, Aug 06, 2010 at 07:51:18PM +0100, Garth N. Wells 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > On Fri, 2010-08-06 at 20:36 +0200, Anders Logg wrote:
> > > > > > > > > > > > > On Fri, Aug 06, 2010 at 04:55:44PM +0100, Garth N. 
> > > > > > > > > > > > > Wells wrote:
> > > > > > > > > > > > > > On Fri, 2010-08-06 at 08:42 -0700, Johan Hake wrote:
> > > > > > > > > > > > > > > On Friday August 6 2010 08:16:26 you wrote:
> > > > > > > > > > > > > > > > ------------------------------------------------------------
> > > > > > > > > > > > > > > > revno: 4896
> > > > > > > > > > > > > > > > committer: Garth N. Wells <gn...@cam.ac.uk>
> > > > > > > > > > > > > > > > branch nick: dolfin-all
> > > > > > > > > > > > > > > > timestamp: Fri 2010-08-06 16:13:29 +0100
> > > > > > > > > > > > > > > > message:
> > > > > > > > > > > > > > > >   Add simple Stokes solver for parallel testing.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >   Other Stokes demos don't run in parallel 
> > > > > > > > > > > > > > > > because MeshFunction io is not
> > > > > > > > > > > > > > > >   supported in parallel.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Does anyone have an overview of what is needed 
> > > > > > > > > > > > > > > for this to be fixed. I
> > > > > > > > > > > > > > > couldn't find a blueprint on it.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Here it is:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >     
> > > > > > > > > > > > > > https://blueprints.launchpad.net/dolfin/+spec/parallel-io
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I am interested in getting this fixed :)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Me too! We need to look at all the io since much of 
> > > > > > > > > > > > > > it is broken in
> > > > > > > > > > > > > > parallel.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > We need to settle on how to handle XML data. I 
> > > > > > > > > > > > > > favour (and I know Niclas
> > > > > > > > > > > > > > Janson does too) the VTK approach in which we have 
> > > > > > > > > > > > > > a 'master file' that
> > > > > > > > > > > > > > points to other XML files which contain portions of 
> > > > > > > > > > > > > > the vector/mesh,
> > > > > > > > > > > > > > etc. Process zero can read the 'master file' and 
> > > > > > > > > > > > > > then instruct the other
> > > > > > > > > > > > > > processes on which file(s) they should read in.
> > > > > > > > > > > > >
> > > > > > > > > > > > > This only works if the data is already partitioned. 
> > > > > > > > > > > > > Most of our demos
> > > > > > > > > > > > > assume that we have the mesh in one single file which 
> > > > > > > > > > > > > is then
> > > > > > > > > > > > > partitioned on the fly.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > The approach does work for data which is not 
> > > > > > > > > > > > partitioned. Just like with
> > > > > > > > > > > > VTK, one can read the 'master file' or the individual 
> > > > > > > > > > > > files.
> > > > > > > > > > > >
> > > > > > > > > > > > > The initial plan was to support two different ways of 
> > > > > > > > > > > > > reading data in parallel:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1. One file and automatic partitioning
> > > > > > > > > > > > >
> > > > > > > > > > > > > DOLFIN gets one file "mesh.xml", each process reads 
> > > > > > > > > > > > > one part of it (just
> > > > > > > > > > > > > skipping other parts of the file), then the mesh is 
> > > > > > > > > > > > > partitioned and
> > > > > > > > > > > > > redistributed.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2. Several files and no partitioning
> > > > > > > > > > > > >
> > > > > > > > > > > > > DOLFIN get multiple files and each process reads one 
> > > > > > > > > > > > > part. In this
> > > > > > > > > > > > > case, the mesh and all associated data is already 
> > > > > > > > > > > > > partitioned. This
> > > > > > > > > > > > > should be very easy to fix since everything that is 
> > > > > > > > > > > > > needed is already
> > > > > > > > > > > > > in place; we just need to fix the logic. In 
> > > > > > > > > > > > > particular, the data
> > > > > > > > > > > > > section of each local mesh contains all auxilliary 
> > > > > > > > > > > > > parallel data.
> > > > > > > > > > > > >
> > > > > > > > > > > > > This can be handled in two different ways. Either a 
> > > > > > > > > > > > > user specifies the
> > > > > > > > > > > > > name of the file as "mesh*.xml", in which case DOLFIN 
> > > > > > > > > > > > > appends say
> > > > > > > > > > > > >
> > > > > > > > > > > > >   "_%d" % MPI::process_number()
> > > > > > > > > > > > >
> > > > > > > > > > > > > on each local process.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The other way is to have a master file which lists 
> > > > > > > > > > > > > all the other
> > > > > > > > > > > > > files. In this case, I don't see a need for process 0 
> > > > > > > > > > > > > to take any kind
> > > > > > > > > > > > > of responsibility for communicating file names. It 
> > > > > > > > > > > > > would work fine for
> > > > > > > > > > > > > each process to read the master file and then check 
> > > > > > > > > > > > > which file it
> > > > > > > > > > > > > should use. Each process could also check that the 
> > > > > > > > > > > > > total number of
> > > > > > > > > > > > > processes matches the number of partitions in the 
> > > > > > > > > > > > > file. We could let
> > > > > > > > > > > > > process 0 handle the parsing of the master file and 
> > > > > > > > > > > > > then communicate
> > > > > > > > > > > > > the file names but maybe that is an extra 
> > > > > > > > > > > > > complication.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > This fails when the number of files differs from the 
> > > > > > > > > > > > number of
> > > > > > > > > > > > processes. It's very important to support m files on n 
> > > > > > > > > > > > processes. We've
> > > > > > > > > > > > discussed this at length before.
> > > > > > > > > > >
> > > > > > > > > > > I don't remember. Can you remind me of what the reasons 
> > > > > > > > > > > are?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I perform a simulation using m processes, and write the 
> > > > > > > > > > result to m
> > > > > > > > > > files. Later I want to use the result later in another 
> > > > > > > > > > computation using
> > > > > > > > > > n processors.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I've looked a little into parallel io, and looked at what 
> > > > > > > > > Trilinos and
> > > > > > > > > PETSc do. Both support HDF5, and HDF5 has been developed to 
> > > > > > > > > work
> > > > > > > > > parallel. HDF5 does not advocate the one-file per processes 
> > > > > > > > > (too awkward
> > > > > > > > > and complicated they say), but advocates a one file approach. 
> > > > > > > > > It has
> > > > > > > > > tools that allow different processes to write to different 
> > > > > > > > > parts of the
> > > > > > > > > same file in parallel.
> > > > > > > > >
> > > > > > > > > >From reading this, what I propose (for now) is:
> > > > > > > > >
> > > > > > > > > 1. We only ever write one XML file for a given object. This 
> > > > > > > > > file can be
> > > > > > > > > read by different processes, with each reading in only a 
> > > > > > > > > chunk.
> > > > > > > > >
> > > > > > > > > 2. We should add an XML format for partitioning data 
> > > > > > > > > (Trilinos calls
> > > > > > > > > this a 'map'). If a map file is present, it is used to define 
> > > > > > > > > the
> > > > > > > > > partitions. It may make sense to have a map file for each 
> > > > > > > > > process (but
> > > > > > > > > no need for a 'master file').
> > > > > > > >
> > > > > > > > I suggest something slightly different. I'm ok with the one file
> > > > > > > > approach, but it would be good to store that data in a 
> > > > > > > > partitioned
> > > > > > > > way. Our current model for parallel computing is that each 
> > > > > > > > process has
> > > > > > > > a Mesh and each process has the partitioning data it needs 
> > > > > > > > stored in
> > > > > > > > the data section of the Mesh. So each process has just a 
> > > > > > > > regular mesh
> > > > > > > > with some auxilliary data attached to it. That makes it easy to 
> > > > > > > > read
> > > > > > > > and write using already existing code. (No need for a special 
> > > > > > > > parallel
> > > > > > > > format.)
> > > > > > > >
> > > > > > > > But we could easily throw all that data into one big file, 
> > > > > > > > something
> > > > > > > > like this:
> > > > > > > >
> > > > > > > > <distributed_mesh num_parts="16">
> > > > > > > >   <mesh ...>
> > > > > > > >     ...
> > > > > > > >   </mesh>
> > > > > > > >   <mesh ...>
> > > > > > > >     ...
> > > > > > > >   </mesh>
> > > > > > > >   ...
> > > > > > > > </distributed_mesh>
> > > > > > > >
> > > > > > >
> > > > > > > I would like to separate mesh and partitioning data. A 
> > > > > > > partitioning of a
> > > > > > > given mesh is not unique to a mesh, so it should be separated. A
> > > > > > > partition could still go in the same XML file though.
> > > > > >
> > > > > > It is separate from the mesh, but it is stored locally as part of 
> > > > > > the
> > > > > > Mesh in MeshData. I think this has proved to be a very good way
> > > > > > (efficient and simple) to store the data.
> > > > > >
> > > > >
> > > > > This breaks the concept of using a given Mesh on a different number of
> > > > > processes. For a given mesh, I may want to store and use different
> > > > > partitions.
> > > > >
> > > > > The same applied to Vectors.
> > > > >
> > > > > > Or do you suggest that we store it differently on file than we do as
> > > > > > part of the DOLFIN data structures?
> > > > > >
> > > > >
> > > > > I suggest that we decouple the Mesh and the partition data for file
> > > > > output. It comes back to permitting a variable number of processes.
> > > >
> > > > I thought that the storage output from m processes would be the mesh
> > > > partitioned into m pieces with auxilliary partitioning
> > > > information. This would then be read by n processes and repartitioned.
> > > >
> > > > If you don't want the partitioning into a specific number of
> > > > partitions (like m partitions above), then what extra partitioning
> > > > data would be required? It would just be a regular mesh file.
> > > >
> > >
> > > It would map entities to processes. It could also be used for Vectors.
> >
> > Then how is that not coupled to a specific number of processes?
> >
> > And why would it be better than what we have today, which stores all
> > the parallel data we need (not just that mapping but everything else
> > that can and must be generated from it) as part of each mesh?
> >
> > > > Partitioning data is always coupled to a specific number of processes
> > > > as far as I can imagine.
> > > >
> > >
> > > Naturally, and so it should be. But the mesh output doesn't have to be,
> > > which why I suggest having a Mesh and partitioning data, e.g.
> > >
> > >   // Let DOLFIN partition and distribute the mesh
> > >   Mesh meshA("mesh.xml")
> >
> > Is this just one regular mesh file?
> >
>
> Yes, and ParMETIS would partition the mesh.
>
> > >   // Partition the mesh according to my_partition.xml mesh. Throw an
> > >   // error if there is a mis-match between my_partition and the number
> > >   // of processes
> > >   Mesh meshB("mesh.xml", "my_partition.xml")
> >
> > I don't understand this use-case. The first line above would already
> > partition the mesh (using ParMETIS), and then you want to repartition
> > it. Is it the case that my_partition.xml contains a better
> > partitioning than what ParMETIS can compute?
> >
>
> In the case "meshB("mesh.xml", "my_partition.xml")", ParMETIS would
> never be called. The mesh would be distributed according to
> "my_partition.xml".

I missed that you created two different meshes. ok, that sounds like a
good thing to have.

But how would the mesh.xml be stored on file? Just one big file?

And how is the above example related to the storing of *partitioned*
meshes?

It looks to me like it's just an additional feature, which is to be
able to specify the partitioning an bypass ParMETIS.

--
Anders

Attachment: signature.asc
Description: Digital signature

_______________________________________________
Mailing list: https://launchpad.net/~dolfin
Post to     : dolfin@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dolfin
More help   : https://help.launchpad.net/ListHelp

Reply via email to