Re: [DuMux] New VTK writer

Dennis Gläser Mon, 22 May 2023 05:02:02 -0700

Hi Leo,

just one quick, however rather uninformed, answer. The following points are 
things I heard, I don’t have much experience with xdmf myself.


From what I’ve heard, xdmf support is not optimal in ParaView and its 
maintenance is not the best. I was told that xmdf-v2 has some issues with 
memory leaks, but xdmf-v3 seems to have some missing features and performance 
issues in paraview, while apparently not working well in parallel either. At 
least I saw that there are many open issues related to this, both in VTK and 
ParaView, and for several years already. For instance: 
https://gitlab.kitware.com/vtk/vtk/-/issues/17702 or 
https://gitlab.kitware.com/paraview/paraview/-/issues/17295

Also, I heard from several sides that people steer away from xdmf as it 
apparently is no longer actively maintained (see e.g. 
https://discourse.paraview.org/t/xdmf-hyperslab-with-timesteps/5174/2). The 
website (https://www.xdmf.org/index.php/Main_Page) seems to not have been 
touched since 2017, and the last commit in the repo was two years ago 
(https://gitlab.kitware.com/xdmf/xdmf).

That being said, it seems that you are successfully using the file format, so 
this may all not be so relevant. However, a benefit of the VTK-HDF format that 
I see is that you don’t need a storage file and a separate xml file which 
references into it, but you have it all in one file. But I see the issue of not 
having newer paraview versions on clusters… I just wanted to let you know about 
these latest developments at VTK that I heard of.

Cheers,
Dennis



> On 22. May 2023, at 13:26, [email protected] wrote:
> 
> Hi Dennis and all,
>  
> I don't know why the VTK people extended their format to VTK-HDF instead of 
> using the existing XDMF/HDF5 format which is already supported by Paraview. 
> It looks like they invented the same things again. However, it makes no 
> difference to me since converting between both formats should be easy.  It is 
> hard to have bleeding edge software (latest Paraview, Compilers) on our 
> cluster, so switching to VTK-HDF5 will take a long time. As I said before, 
> writing to one single file in parallel works great with HDF5 and the XDMF 
> format, this should be the same for VTK-HDF5. 
>  
> Reading files and user data is always a pain. Before your post on this 
> mailing list I was convinced to develop a Python code for the parallel I/O 
> tasks. The combination of embedded Python and a new module which uses h5py 
> and mpi4py for different I/O patterns and file formats was my plan. I've 
> already developed a Python script that calls the METIS library from Python, 
> partitions a triangular mesh, writes parallel DGF files per rank and computes 
> the overlap.
>  
> The Dune philosophy of separation of the grid and the user data is tricky. In 
> the next step I've planned to use the Scipy-CKDTree to look up the 
> position/globalId  for each element/data from the input file. This should 
> also solve the overlap problem. In combination with a spatial interpolation 
> it should also work for refined meshes. Therefore one can read the global 
> input file XDMF (or VTK-HDF) with one rank per node and exchange the data 
> within the node  with the other ranks of the node via mpi4py. Our newest 
> cluster provides 128 ranks per node, depending on the model size we use 10-40 
> nodes. This pattern reduces the overall memory consumption during the init 
> process. 
>  
> H5py in combination with mpi4py should allow to write the data of each rank 
> in parallel to one single HDF5 file. Instead of sorting the elements in 
> advance, I would have resorted the elements in the result-file. This allows 
> to write the data in blocks/hyperslabs from each rank. There should be a 
> simple dictionary to convert between the original globalId and the globalId 
> after The partitioning. As you already mentioned, HDF5 and especially XDMF 
> allow to reduce data. This can be the grid or other spatial data which do not 
> change over time (e.g. friction values). HDF5 supports also compression to 
> reduce the data size.  
>  
> It would be so much easier to simply use one file per rank. Therefore one 
> just has to partition the data in advance and store the initial and result 
> data in the lokal rank files. When the simulation has finished one can 
> collect the data and create a single result file. The main reason why I don't 
> like the simple solution is the statement "Friends don't let friends use 
> file-per-process" in some HDF5 slides (see  
> https://www.hdfgroup.org/wp-content/uploads/2020/06/2020-06-26-Parallel-HDF5-Performance-Tuning.pdf
>  ) 
> <https://www.hdfgroup.org/wp-content/uploads/2020/06/2020-06-26-Parallel-HDF5-Performance-Tuning.pdf)>.
>  Nevertheless, I thought that this simple would be a starting point for the 
> Python I/O library and an alternative when no mpi4py or no parallel h5py is 
> available on a system.  
>  
> Cheers,
> Leo
>> Dennis Gläser <[email protected]> hat am 22.05.2023 11:02 
>> CEST geschrieben:
>>  
>>  
>> Hi Leo and all,
>>  
>> maybe you find this interesting: VTK recently extended the VTK-HDF format to 
>> be able to describe transient simulations. So you can put the entire 
>> simulation (also parallel runs) with all time steps into one .hdf file which 
>> you can load in e.g. ParaView. Of course, depending on the size of your 
>> problem and the amount of time steps, this file can get pretty large. But 
>> ParaView (as far as I see it) always only loads the piece of the data you 
>> need to visualise the current time step. A benefit is that if the grid 
>> doesn’t change, there is no need to add the grid information for each time 
>> step, but you only need to save that data once in the file. So overall, you 
>> save some storage space...
>>  
>> This is the related merge request: 
>> https://gitlab.kitware.com/vtk/vtk/-/merge_requests/10094
>>  
>> It’s not yet part of a release of VTK and/or ParaView, and there may be some 
>> subtle changes to the format specification until the next VTK release. For 
>> now you can test the new format & readers with the latest paraview master 
>> build available on their website.
>>  
>> I already extended the implementation of the library used in the background 
>> of the new proposed Dumux-VTKWriter to support this format; it didn't 
>> require many changes. It’s a drafty implementation and on some merge request 
>> still, but as soon as VTK releases the format & an associated reader 
>> (hopefully with the next release), I’ll finish it up and make it available.
>>  
>> Cheers,
>> Dennis
>>  
>> 
>> 
>>> On 17. May 2023, at 14:55, Dennis Gläser 
>>> <[email protected]> wrote:
>>> Hi Leo,
>>>  
>>> thanks for your reply!
>>>  
>>>> The actual VTK solution with one file per rank and time step was no option 
>>>> for our workflow.
>>>  
>>> Yes, this is a bit cumbersome. Although I was just thinking that it could 
>>> be less painful if a folder was created for each time step with all the 
>>> pieces in it. So you’d have “file_timestep_x.pvtu” and a folder 
>>> “file_timestep_x” with all the pieces in it, referenced by the pvtu file. 
>>> My main problem with this so far has been the flood of files you get in one 
>>> folder.... With subfolders it is a bit easier to navigate, I think (our 
>>> writers currently don’t create subfolders for the files).
>>>  
>>>> I like to share some experiences and problems we had so far with HDF5 and 
>>>> parallel I/0. The standard DUNE workflow of reading a mesh and user data 
>>>> from a single rank (e.g. rank 0) in combination with a loadbalance does 
>>>> not work well for large meshes with lots of userdata since the hughe 
>>>> memory consumption of rank 0 and the high network traffic when 
>>>> distributing the user data.
>>>>  
>>>> Is it possible do start a simulation by reading the grid and user data 
>>>> from a single VTK-HDF5 file? I know that this is not easys since the whole 
>>>> data may not fit in the memory of a single rank.  A "single-shared file 
>>>> for N ranks" might be an option to overcome this problem since a 
>>>> "single-shared-file" produces unwanted load on a parallel files stystem.  
>>>  
>>> So far I haven’t looked into reading the files, just the output part. But 
>>> regarding your issue: the VTK-HDF file format (for unstructured grids) 
>>> contains information on which part of the mesh originally lived on which 
>>> process. Thus, if you restart the simulation with the same number of 
>>> processes, you could read in only the slices for the process at hand. The 
>>> tricky part will be, however, to then tell dune which vertices overlap. Not 
>>> sure if/ how this plays out.
>>>  
>>>> Note, the HighFive template library is not thread-safe. Maybe this can be 
>>>> a problem.
>>> 
>>> I know, but the writer does not use multithreading. Each slice (per mpi 
>>> rank) is written sequentially.
>>>  
>>> Cheers,
>>> Dennis
>>>  
>>>  
>>> 
>>> 
>>>> On 17. May 2023, at 14:36, [email protected] wrote:
>>>> Hi Dennis,
>>>>  
>>>> this sounds really great. In dumux-shallowwater we use XDMF in combination 
>>>> wit HDF5 to save our simulation data into a single HDF5-File and a 
>>>> XDMF-file (see https://www.xdmf.org/index.php/Main_Page). This works 
>>>> really great and reduces the number of output-files when doing simulations 
>>>> with 1000+ of MPI-ranks. The actual VTK solution with one file per rank 
>>>> and time step was no option for our workflow.
>>>>  
>>>> I like to share some experiences and problems we had so far with HDF5 and 
>>>> parallel I/0. The standard DUNE workflow of reading a mesh and user data 
>>>> from a single rank (e.g. rank 0) in combination with a loadbalance does 
>>>> not work well for large meshes with lots of userdata since the hughe 
>>>> memory consumption of rank 0 and the high network traffic when 
>>>> distributing the user data.
>>>>  
>>>> Is it possible do start a simulation by reading the grid and user data 
>>>> from a single VTK-HDF5 file? I know that this is not easys since the whole 
>>>> data may not fit in the memory of a single rank.  A "single-shared file 
>>>> for N ranks" might be an option to overcome this problem since a 
>>>> "single-shared-file" produces unwanted load on a parallel files stystem.  
>>>>  
>>>> Do you plan an option to partition the mesh in advance and use the 
>>>> parallel dgf-format or the methods supported by ALUGrid to read a 
>>>> distributed grid? As an alternative one can also pass the partition to 
>>>> UGGrid (rank list). Maybe this method are not really needed if no user 
>>>> data is read and distributed from one single rank.
>>>>  
>>>> I actually plan to change the HDF5 read/write code in dumux-shallowwater. 
>>>> However, switching to your code and converting our inital HDF5 file into a 
>>>> VTK-HDF5 file before a simulation and reconverting the VTK-HDF5 files 
>>>> afterwards to HDF5/XDMF sounds like a much better solution for 
>>>> dumux-shallowwater.
>>>>  
>>>> Best regards,
>>>> Leo
>>>>  
>>>> P.S.
>>>> Note, the HighFive template library is not thread-safe. Maybe this can be 
>>>> a problem.
>>>>  
>>>>  
>>>>> Dennis Gläser <[email protected]> hat am 17.05.2023 
>>>>> 10:16 CEST geschrieben:
>>>>>  
>>>>>  
>>>>> Dear DuMuX community,
>>>>> 
>>>>> I recently started developing a new mechanism for writing out VTK files, 
>>>>> since for a project we needed more space-efficient VTK flavours (for 
>>>>> structured grids) and data compression to save even more space. I have 
>>>>> started adding a wrapper for this to DuMuX, and it would be great if some 
>>>>> of you could give this a try and give me feedback on what things (can) go 
>>>>> wrong, or how the API could be made more intuitive, or where we need to 
>>>>> add more explanatory error messages (e.g. static_asserts over cryptic 
>>>>> compiler errors), etc…
>>>>> 
>>>>> The new writer is on the feature/generic-time-series-writer branch: 
>>>>> https://git.iws.uni-stuttgart.de/dumux-repositories/dumux/-/commits/feature/generic-time-series-writer
>>>>>  
>>>>> 
>>>>> It uses a library under the hood which I added as a git submodule to 
>>>>> dumux (for now) . To pull it in you have to type
>>>>>  
>>>>> "git submodule update --init —recursive”
>>>>>  
>>>>>  
>>>>> The “—recursive” brings in an additional dependency in case you want to 
>>>>> write into the VTK-HDF file format. You may omit this if you don’t want 
>>>>> this. Running dunecontrol afterwards should configure everything you 
>>>>> need..  However, you need a relatively new compiler as this requires 
>>>>> c++20. I tested g++-12 and g++-13, I think g++-11 is still missing some 
>>>>> c++-20 features we need here. The newest clang compiler doesn’t work yet 
>>>>> as the ranges support is still experimental...
>>>>>  
>>>>> I added a test that shows how the writer can be used (although one has to 
>>>>> destructure this a bit because it tests multiple configurations via 
>>>>> function calls etc.): 
>>>>> https://git.iws.uni-stuttgart.de/dumux-repositories/dumux/-/blob/447ffa9d051a0bb322236bc8a4d198b139c043cd/test/io/test_timeseries_writer.cc
>>>>>  
>>>>> Some benefits:
>>>>> - for YaspGrid, it uses .vti or .vts per default, which saves you from 
>>>>> wasting space on points; connectivity; etc.
>>>>> - allows you to use data compression with zlib, lz4 or lzma (if found on 
>>>>> the system)
>>>>> - if a compressor is found, compression is enabled per default (can be 
>>>>> disabled; compressed output is slower)
>>>>> - allows you to add fields via lambdas, without the need to create 
>>>>> containers for all fields. That is, you can simply add an analytical 
>>>>> solution to your vtk output.
>>>>> - if libhdf5 is found on your system (apt install libhdf5-mpi-dev) and 
>>>>> you added “—recursive” to the submodule update, you can use the vtk-hdf 
>>>>> file format, which allows you to write parallel simulations into a single 
>>>>> file per time step.
>>>>> - VTK supports adding time as metadata in xml files, so you can actually 
>>>>> write a time series without a .pvd file, putting the time value in each 
>>>>> individual time step file. An example is in the test.
>>>>>  
>>>>> Two caveats (I opened issues for both of these in VTK&ParaView; I believe 
>>>>> they should be easy to fix):
>>>>> - ParaView’s .pvd reader does not support the vtk-hdf file format yet, so 
>>>>> .pdv with .hdf per time step does not open in ParaView.
>>>>> - Time metadata is read from xml files, it doesn’t do that yet for 
>>>>> vtk-hdf files. You can still display the time metadata in Paraview, but 
>>>>> it is not propagated to the control section on top (the play; forward; 
>>>>> backwards buttons)
>>>>>  
>>>>> Drawbacks:
>>>>> - no support for velocity output yet.
>>>>>  
>>>>> Cheers,
>>>>> Dennis
>>>>>  
>>>>>  
>>>>> _______________________________________________ 
>>>>> DuMux mailing list 
>>>>> [email protected] 
>>>>> https://listserv.uni-stuttgart.de/mailman/listinfo/dumux
>>>>  
>>>> Im Auftrag
>>>> 
>>>> Dr.-Ing. Leopold Stadler
>>>> 
>>>> -- 
>>>> Referat Numerische Verfahren im Wasserbau
>>>> Abteilung Wasserbau im Binnenbereich
>>>> 
>>>> Bundesanstalt für Wasserbau
>>>> Federal Waterways Engineering and Research Institute
>>>> Kußmaulstraße 17 | 76187 Karlsruhe
>>>> E-Mail: [email protected]
>>>> 
>>>> Tel.: +49 721 9726-3525
>>>> Fax: +49 721 9726-4540
>>>> https://www.baw.de 
>>>> 
>>>> 
>>>> _______________________________________________ 
>>>> DuMux mailing list 
>>>> [email protected] 
>>>> https://listserv.uni-stuttgart.de/mailman/listinfo/dumux
>>> _______________________________________________ 
>>> DuMux mailing list 
>>> [email protected] 
>>> https://listserv.uni-stuttgart.de/mailman/listinfo/dumux
>  
> Im Auftrag
> 
> Dr.-Ing. Leopold Stadler
> 
> -- 
> Referat Numerische Verfahren im Wasserbau
> Abteilung Wasserbau im Binnenbereich
> 
> Bundesanstalt für Wasserbau
> Federal Waterways Engineering and Research Institute
> Kußmaulstraße 17 | 76187 Karlsruhe
> E-Mail: [email protected]
> 
> Tel.: +49 721 9726-3525
> Fax: +49 721 9726-4540
> https://www.baw.de 
> 
> 
> _______________________________________________
> DuMux mailing list
> [email protected]
> https://listserv.uni-stuttgart.de/mailman/listinfo/dumux

_______________________________________________
DuMux mailing list
[email protected]
https://listserv.uni-stuttgart.de/mailman/listinfo/dumux

Re: [DuMux] New VTK writer

Reply via email to