On Sun, Mar 14, 2010 at 04:06:10PM +0000, Garth N. Wells wrote: > > > Anders Logg wrote: > > On Sun, Mar 14, 2010 at 03:42:29PM +0000, Garth N. Wells wrote: > >> > >> Anders Logg wrote: > >>> On Sun, Mar 14, 2010 at 08:45:39AM +0000, Garth N. Wells wrote: > >>>> Anders Logg wrote: > >>>>> On Sun, Mar 14, 2010 at 08:35:32AM +0000, Garth N. Wells wrote: > >>>>>> Anders Logg wrote: > >>>>>>> On Sun, Mar 14, 2010 at 07:39:45AM +0000, Garth N. Wells wrote: > >>>>>>>> Anders Logg wrote: > >>>>>>>>> On Fri, Mar 12, 2010 at 06:58:22PM -0000, [email protected] > >>>>>>>>> wrote: > >>>>>>>>>> ------------------------------------------------------------ > >>>>>>>>>> revno: 4635 > >>>>>>>>>> committer: Garth N. Wells <[email protected]> > >>>>>>>>>> branch nick: dolfin-all > >>>>>>>>>> timestamp: Fri 2010-03-12 18:53:05 +0000 > >>>>>>>>>> message: > >>>>>>>>>> Work on reading Vectors in parallel. Some issues to resolve > >>>>>>>>>> still. > >>>>>>>>>> > >>>>>>>>>> Some issues: > >>>>>>>>>> - How should files be named when in parallel? > >>>>>>>>>> - Should we have a 'master' xml file which points to the files > >>>>>>>>>> - from different processes? > >>>>>>>>> I think this should be done in the same way as for Meshes. We > >>>>>>>>> discussed the following design: > >>>>>>>>> > >>>>>>>>> 1. Reading a single file "foo.xml" results in each process reading > >>>>>>>>> the > >>>>>>>>> entire file but skipping data located on another process as > >>>>>>>>> determined > >>>>>>>>> by local_range. This is what is implemented now for meshes (followed > >>>>>>>>> by communication and mesh partitioning). The difference for vectors > >>>>>>>>> would be that no extra communication is necessary. > >>>>>>>>> > >>>>>>>> OK. > >>>>>>>> > >>>>>>>>> 2. Reading a set of files "foo*.xml" results in each process reading > >>>>>>>>> its portion stored in "foo%d.xml" % p. The File interface then needs > >>>>>>>>> to check for the occurence of '*' and figure out the correct file > >>>>>>>>> name > >>>>>>>>> based on its process number. > >>>>>>>>> > >>>>>>>> I think that are a number of advantages to having a single .xml that > >>>>>>>> points to the 'sub-files'. An obvious advantage is that we won't > >>>>>>>> need to > >>>>>>>> distinguish between cases 1 and 2 when reading in a vector. > >>>>>>>> > >>>>>>>> Garth > >>>>>>> I don't feel strongly about either option, but if we go for the > >>>>>>> master-file/sub-file design I think we should do the same for vectors > >>>>>>> and meshes. > >>>>>>> > >>>>>>> The master file could look something like this for vectors: > >>>>>>> > >>>>>>> <distributed_vector size="1024" num_partitions="16"> > >>>>>>> <sub_vector partition="0" file="foo_0.xml" offset="0"/> > >>>>>>> <sub_vector partition="1" file="foo_1.xml" offset="64"/> > >>>>>>> <sub_vector partition="2" file="foo_2.xml" offset="128"/> > >>>>>>> ... > >>>>>>> </distributed_vector> > >>>>>>> > >>>>>> Looks good, except 'offset' should be 'size', or 'local_size'. > >>>>> Yes, but then maybe it's not needed since the local size will be > >>>>> available in the local files (which can be standard XML vector data). > >>>>> > >>>>> But then won't the master files always be trivial? The only extra > >>>>> information that is contained in the master file is the total size, > >>>>> and the number of partitions (which will only be used to check that it > >>>>> matches the actual number of processes). > >>>>> > >>>> The master file is the definitive file. Say a program is run with 4 > >>>> processes, and then with 2. The files vector_0.xml, vector_1.xml, > >>>> vector_2.xml and vector_3.xml will be floating around, but which files > >>>> make up the vector? The master file will point to vector_0.xml and > >>>> vector_1.xml. > >>> I don't understand how that would work. Would it repartition the > >>> entire vector or just use the first two? > >>> > >> It would read the first two. What the program does with them from that > >> point onwards is separate issue. > > > > That seems like a strange situation. Will that ever happen? (Storing > > data from n processes and then reading back a subset on m < n > > processes.) > > > > It could very well happen, for example reading data in on one process to > manipilate it, restart a computation with a different number of > processes, etc.
It sounds strange. If one process should read in some specific data, then it can just access foo_p.xml directly (without working through some master file). > >>>> Also, there should be no need to check that the number of 'partitions' > >>>> matches the number of processes. > >>> That seems to be the only real use of having a master file, at least > >>> the only extra information contained in the master file and not > >>> contained in the local files. > >>> > >> The master file *defines* which files are the sub files. For example, a > >> collection of .xml files could be read by a single process program, just > >> like ParaView does. > > > > Yes, but those files will most likely always have the same numbering > > scheme (if stored from DOLFIN), something like foo_1.xml, foo_2.xml > > etc. Then we might as well do "foo_*.xml". > > > > That's not my point. If I have a directory full of foo_*.xml how can I > know which ones make up the vector? It precisely analogous to VTK. My > directory can be full of .vtu files, but by opening .pvd I can always > correctly visualise a result. Yes, that's a good point. -- Anders
signature.asc
Description: Digital signature
_______________________________________________ Mailing list: https://launchpad.net/~dolfin Post to : [email protected] Unsubscribe : https://launchpad.net/~dolfin More help : https://help.launchpad.net/ListHelp

