On 25/08/11 03:33, Anders Logg wrote: > On Wed, Aug 24, 2011 at 09:14:55AM -0400, Garth N. Wells wrote: >> >> >> On 24/08/11 08:55, Anders Logg wrote: >>> On Wed, Aug 24, 2011 at 08:43:40AM -0400, Garth N. Wells wrote: >>>> >>>> >>>> On 24/08/11 08:07, Anders Logg wrote: >>>>> On Wed, Aug 24, 2011 at 07:22:53AM -0400, Garth N. Wells wrote: >>>>>> >>>>>> >>>>>> On 24/08/11 03:50, Anders Logg wrote: >>>>>>> What is the plan for XMLLocalMeshData (using the DOM interface) vs >>>>>>> XMLLocalMeshDataDistributed (using the SAX interface)? >>>>>>> >>>>>> >>>>>> Both for the time being. >>>>>> >>>>>>> Reading boundary indicators is currently failing with >>>>>>> >>>>>>> RuntimeError: *** Error: Inconsistent state in XML reader: 6. >>>>>>> >>>>>>> Should this be fixed in XMLLocalMeshDataDistributed or is the plan to >>>>>>> replace it with XMLLocalMeshData? >>>>>>> >>>>>> >>>>>> In XMLLocalMeshDataDistributed. >>>>> >>>>> Could you elaborate? The functionality for reading and distributing >>>>> boundary markers (in parallel) is currently broken and we want to fix >>>>> it. But we need to know more about the design. I don't want to fix >>>>> something if you decide to break it 5 min later. >>>>> >>>> >>>> It never 'properly' worked in parallel. There were some messy ad hoc >>>> changes made on top of functions that were planned for overhaul. I made >>>> clear before this that parallel functionality was being sorted out (it's >>>> not just in io, but also partitioning, etc), so the fact that it's not >>>> working now should not be a surprise. >>> >>> It came as a surprise since there was a unit test for it and the unit >>> test was removed. But nevermind, the important thing now is to get it >>> working again. >>> >>>> There is a lot of missing functionality is parallel, so patience is >>>> required to get things done properly. >>>> >>>>> Should we continue to use libxml2? Why not use the DOM parsing all the >>>>> way? >>>>> >>>> >>>> Because I'm inclined to keep SAX parsing for meshes since meshes are the >>>> most likely to be created externally, and need to be scalable for >>>> reading. Other objects (e.g., vectors) are likely to be created and >>>> written by DOLFIN, so will eventually use parallel HDF5 for scalable >>>> parallel io. >>> >>> The reason I once chose to implement the XML parsing using SAX, and >>> why Ola decided to use SAX in his rewrite 2 years back, is exactly >>> that: scalability and efficiency. I don't see why it should be >>> different for meshes than any other objects. Other objects can also be >>> large. It seems messy to use both. >>> >> >> XML (be it with SAX ad DOM parsers) is not a scalable or efficient >> solution. The scalable and efficient solution in binary + MPI. This will >> appear when time permits. > > Sure, but I would claim SAX scales better.
In terms of memory, yes. It is not sufficiently scalable to be a total solution. Plus it's too slow. > Wouldn't it be better to > just use one of DOM or SAX? Maybe. A SAX implementation is considerably more complex. The new implementation reserves this complexity for a possibly critical case and localises the complexity of the code. The old code was very complex and less localised. The locality means that it's no big deal to have a simple DOM implementation for the majority of cases next to a more complex SAX implementations for special cases. There is no point in the size and complexity of a SAX parser for simple cases, e.g. reading parameter files. > Either we use SAX all the way if it gives > better performance than DOM, It doesn't give better performance. We discussed this before. Without checking the archive, I recall that the DOM implementation was about 50 times faster for large data sets than the old SAX implementation. > or we use DOM all the way as a solution > for medium sized problems and complement with HDF5 for large scale > problems. Having DOM + SAX + HDF5 seems messy. > This may happen, but the fact is that we don't have HDF5 in place yet. >>>>> What is the difference between XMLLocalMeshData and >>>>> XMLLocalMeshDataDistributed etc. >>>>> >>>> >>>> Initially I planned to use DOM for all, but as outlined above decided >>>> after some testing to retain SAX for meshes (but update to SAX2, since >>>> the libxml2 SAX parser is deprecated and has memory leaks). Hence, >>>> XMLLocalMeshData uses DOM and XMLLocalMeshDataDistributed uses SAX. So >>>> far I've kept the DOM version since it's easy to code and could be >>>> useful when reading non-distributed meshes on each process (which may >>>> differ on different processes). >>> >>> I don't understand the difference between XMLLocalMeshData and >>> XMLLocalMeshDataDistributed. Is XMLLocalMeshDataDistributed doing now >>> what XMLLocalMeshData did before? >>> >> >> Yes, but updated to SAX2 (which was very painful). >> >> The 'new' XMLLocalMeshData is a DOM version. It could be removed. > > Or kept if we will add HDF5 anyway as a more scalable solution. > Again, it may be desirable to keep a SAX parser for reading meshes in parallel since a mesh is the most likely large data structure to be created externally, and the most complex. HDF5 would require a user to supply a binary mesh file rather than an XML file. Most other large data sets are created internally, and the read and written. In this case, HDF5 will be fine. Garth > -- > Anders _______________________________________________ Mailing list: https://launchpad.net/~dolfin Post to : dolfin@lists.launchpad.net Unsubscribe : https://launchpad.net/~dolfin More help : https://help.launchpad.net/ListHelp