Hi Leo and all, maybe you find this interesting: VTK recently extended the VTK-HDF format to be able to describe transient simulations. So you can put the entire simulation (also parallel runs) with all time steps into one .hdf file which you can load in e.g. ParaView. Of course, depending on the size of your problem and the amount of time steps, this file can get pretty large. But ParaView (as far as I see it) always only loads the piece of the data you need to visualise the current time step. A benefit is that if the grid doesn’t change, there is no need to add the grid information for each time step, but you only need to save that data once in the file. So overall, you save some storage space...
This is the related merge request: https://gitlab.kitware.com/vtk/vtk/-/merge_requests/10094 It’s not yet part of a release of VTK and/or ParaView, and there may be some subtle changes to the format specification until the next VTK release. For now you can test the new format & readers with the latest paraview master build available on their website. I already extended the implementation of the library used in the background of the new proposed Dumux-VTKWriter to support this format; it didn't require many changes. It’s a drafty implementation and on some merge request still, but as soon as VTK releases the format & an associated reader (hopefully with the next release), I’ll finish it up and make it available. Cheers, Dennis > On 17. May 2023, at 14:55, Dennis Gläser > <[email protected]> wrote: > > Hi Leo, > > thanks for your reply! > >> The actual VTK solution with one file per rank and time step was no option >> for our workflow. > > Yes, this is a bit cumbersome. Although I was just thinking that it could be > less painful if a folder was created for each time step with all the pieces > in it. So you’d have “file_timestep_x.pvtu” and a folder “file_timestep_x” > with all the pieces in it, referenced by the pvtu file. My main problem with > this so far has been the flood of files you get in one folder.... With > subfolders it is a bit easier to navigate, I think (our writers currently > don’t create subfolders for the files). > >> I like to share some experiences and problems we had so far with HDF5 and >> parallel I/0. The standard DUNE workflow of reading a mesh and user data >> from a single rank (e.g. rank 0) in combination with a loadbalance does not >> work well for large meshes with lots of userdata since the hughe memory >> consumption of rank 0 and the high network traffic when distributing the >> user data. >> >> Is it possible do start a simulation by reading the grid and user data from >> a single VTK-HDF5 file? I know that this is not easys since the whole data >> may not fit in the memory of a single rank. A "single-shared file for N >> ranks" might be an option to overcome this problem since a >> "single-shared-file" produces unwanted load on a parallel files stystem. > > So far I haven’t looked into reading the files, just the output part. But > regarding your issue: the VTK-HDF file format (for unstructured grids) > contains information on which part of the mesh originally lived on which > process. Thus, if you restart the simulation with the same number of > processes, you could read in only the slices for the process at hand. The > tricky part will be, however, to then tell dune which vertices overlap. Not > sure if/ how this plays out. > >> Note, the HighFive template library is not thread-safe. Maybe this can be a >> problem. > > I know, but the writer does not use multithreading. Each slice (per mpi rank) > is written sequentially. > > Cheers, > Dennis > > > > >> On 17. May 2023, at 14:36, [email protected] wrote: >> >> Hi Dennis, >> >> this sounds really great. In dumux-shallowwater we use XDMF in combination >> wit HDF5 to save our simulation data into a single HDF5-File and a XDMF-file >> (see https://www.xdmf.org/index.php/Main_Page). This works really great and >> reduces the number of output-files when doing simulations with 1000+ of >> MPI-ranks. The actual VTK solution with one file per rank and time step was >> no option for our workflow. >> >> I like to share some experiences and problems we had so far with HDF5 and >> parallel I/0. The standard DUNE workflow of reading a mesh and user data >> from a single rank (e.g. rank 0) in combination with a loadbalance does not >> work well for large meshes with lots of userdata since the hughe memory >> consumption of rank 0 and the high network traffic when distributing the >> user data. >> >> Is it possible do start a simulation by reading the grid and user data from >> a single VTK-HDF5 file? I know that this is not easys since the whole data >> may not fit in the memory of a single rank. A "single-shared file for N >> ranks" might be an option to overcome this problem since a >> "single-shared-file" produces unwanted load on a parallel files stystem. >> >> Do you plan an option to partition the mesh in advance and use the parallel >> dgf-format or the methods supported by ALUGrid to read a distributed grid? >> As an alternative one can also pass the partition to UGGrid (rank list). >> Maybe this method are not really needed if no user data is read and >> distributed from one single rank. >> >> I actually plan to change the HDF5 read/write code in dumux-shallowwater. >> However, switching to your code and converting our inital HDF5 file into a >> VTK-HDF5 file before a simulation and reconverting the VTK-HDF5 files >> afterwards to HDF5/XDMF sounds like a much better solution for >> dumux-shallowwater. >> >> Best regards, >> Leo >> >> P.S. >> Note, the HighFive template library is not thread-safe. Maybe this can be a >> problem. >> >> >>> Dennis Gläser <[email protected]> hat am 17.05.2023 10:16 >>> CEST geschrieben: >>> >>> >>> Dear DuMuX community, >>> >>> I recently started developing a new mechanism for writing out VTK files, >>> since for a project we needed more space-efficient VTK flavours (for >>> structured grids) and data compression to save even more space. I have >>> started adding a wrapper for this to DuMuX, and it would be great if some >>> of you could give this a try and give me feedback on what things (can) go >>> wrong, or how the API could be made more intuitive, or where we need to add >>> more explanatory error messages (e.g. static_asserts over cryptic compiler >>> errors), etc… >>> >>> The new writer is on the feature/generic-time-series-writer branch: >>> https://git.iws.uni-stuttgart.de/dumux-repositories/dumux/-/commits/feature/generic-time-series-writer >>> >>> >>> It uses a library under the hood which I added as a git submodule to dumux >>> (for now) . To pull it in you have to type >>> >>> "git submodule update --init —recursive” >>> >>> >>> The “—recursive” brings in an additional dependency in case you want to >>> write into the VTK-HDF file format. You may omit this if you don’t want >>> this. Running dunecontrol afterwards should configure everything you need.. >>> However, you need a relatively new compiler as this requires c++20. I >>> tested g++-12 and g++-13, I think g++-11 is still missing some c++-20 >>> features we need here. The newest clang compiler doesn’t work yet as the >>> ranges support is still experimental... >>> >>> I added a test that shows how the writer can be used (although one has to >>> destructure this a bit because it tests multiple configurations via >>> function calls etc.): >>> https://git.iws.uni-stuttgart.de/dumux-repositories/dumux/-/blob/447ffa9d051a0bb322236bc8a4d198b139c043cd/test/io/test_timeseries_writer.cc >>> >>> Some benefits: >>> - for YaspGrid, it uses .vti or .vts per default, which saves you from >>> wasting space on points; connectivity; etc. >>> - allows you to use data compression with zlib, lz4 or lzma (if found on >>> the system) >>> - if a compressor is found, compression is enabled per default (can be >>> disabled; compressed output is slower) >>> - allows you to add fields via lambdas, without the need to create >>> containers for all fields. That is, you can simply add an analytical >>> solution to your vtk output. >>> - if libhdf5 is found on your system (apt install libhdf5-mpi-dev) and you >>> added “—recursive” to the submodule update, you can use the vtk-hdf file >>> format, which allows you to write parallel simulations into a single file >>> per time step. >>> - VTK supports adding time as metadata in xml files, so you can actually >>> write a time series without a .pvd file, putting the time value in each >>> individual time step file. An example is in the test. >>> >>> Two caveats (I opened issues for both of these in VTK&ParaView; I believe >>> they should be easy to fix): >>> - ParaView’s .pvd reader does not support the vtk-hdf file format yet, so >>> .pdv with .hdf per time step does not open in ParaView. >>> - Time metadata is read from xml files, it doesn’t do that yet for vtk-hdf >>> files. You can still display the time metadata in Paraview, but it is not >>> propagated to the control section on top (the play; forward; backwards >>> buttons) >>> >>> Drawbacks: >>> - no support for velocity output yet. >>> >>> Cheers, >>> Dennis >>> >>> >>> _______________________________________________ >>> DuMux mailing list >>> [email protected] >>> https://listserv.uni-stuttgart.de/mailman/listinfo/dumux >> >> Im Auftrag >> >> Dr.-Ing. Leopold Stadler >> >> -- >> Referat Numerische Verfahren im Wasserbau >> Abteilung Wasserbau im Binnenbereich >> >> Bundesanstalt für Wasserbau >> Federal Waterways Engineering and Research Institute >> Kußmaulstraße 17 | 76187 Karlsruhe >> E-Mail: [email protected] >> >> Tel.: +49 721 9726-3525 >> Fax: +49 721 9726-4540 >> https://www.baw.de >> >> >> _______________________________________________ >> DuMux mailing list >> [email protected] >> https://listserv.uni-stuttgart.de/mailman/listinfo/dumux > > _______________________________________________ > DuMux mailing list > [email protected] > https://listserv.uni-stuttgart.de/mailman/listinfo/dumux
_______________________________________________ DuMux mailing list [email protected] https://listserv.uni-stuttgart.de/mailman/listinfo/dumux
