On Thu, Oct 3, 2013 at 7:31 AM, Anders Logg <[email protected]> wrote:
> On Wed, Oct 02, 2013 at 06:38:25PM +0100, Garth N. Wells wrote:
> > My tests, based on modifying the below code, show that hashing can
> > take more than 10% of the time to write a Mesh to HDF5.
> >
> > What's required is the right abstraction for handling Functions and
> > files. I think the hashing approach is more a hack. What about
> > something along the lines of:
> >
> > Function u(V);
> > Function w(V);
> >
> > HDF5Function hdf5_function_file("my_filename.h5", "w");
>
> HDF5FunctionFile ?
> ~~~~
>
I like File.
J
> --
> Anders
>
>
> > hdf5_function_file.register(u, "u_name");
> > hdf5_function_file.register(w, "w_name");
> >
> > hdf5_function_file.parameters["common_mesh"] = true;
> > hdf5_function_file.parameters["write_mesh_once"] = true;
> >
> > // Write all registered functions
> > hdf5_function_file.write();
> >
> > // Write all registered functions again
> > hdf5_function_file.write();
> >
> > // Write u only
> > hdf5_function_file.write("u_name");
> >
> > Some HDF5 trickery could be used to link and structure data in the file.
> >
> > Garth
> >
> > On 29 September 2013 18:07, Øyvind Evju <[email protected]> wrote:
> > > from dolfin import *
> > > set_log_level(10000)
> > > if MPI.process_number() == 0:
> > > print "%10s --- %10s" %("Cells", "Time (s)")
> > > for i in range(5,250, 20):
> > > mesh = UnitCubeMesh(i,i,i)
> > > tic()
> > > mesh.hash()
> > > t = toc()
> > > if MPI.process_number() == 0:
> > > print "%10d --- %.10f" %(mesh.size_global(3), t)
> > > del mesh
> > >
> > >
> > > 2013/9/29 Garth N. Wells <[email protected]>
> > >>
> > >> On 29 September 2013 17:46, Øyvind Evju <[email protected]> wrote:
> > >> > From a quad core @ 2.20ghz, calling the mesh.hash() function.
> > >> >
> > >>
> > >> Please post test code.
> > >>
> > >> Garth
> > >>
> > >> > One process:
> > >> > Cells --- Time (s)
> > >> > 750 --- 0.0001020432
> > >> > 93750 --- 0.0019581318
> > >> > 546750 --- 0.0110230446
> > >> > 1647750 --- 0.0335328579
> > >> > 3684750 --- 0.0734529495
> > >> > 6945750 --- 0.1374619007
> > >> > 11718750 --- 0.2321729660
> > >> > 18291750 --- 0.3683109283
> > >> > 26952750 --- 0.5321540833
> > >> > 37989750 --- 0.7479040623
> > >> > 51690750 --- 1.0299670696
> > >> > 68343750 --- 1.3440520763
> > >> > 88236750 --- 1.7490680218
> > >> >
> > >> > Two processes:
> > >> > Cells --- Time (s)
> > >> > 750 --- 0.0002639294
> > >> > 93750 --- 0.0011038780
> > >> > 546750 --- 0.0128669739
> > >> > 1647750 --- 0.0124230385
> > >> > 3684750 --- 0.0274820328
> > >> > 6945750 --- 0.0780282021
> > >> > 11718750 --- 0.1386530399
> > >> > (Out of memory)
> > >> >
> > >> >
> > >> > -Øyvind
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > 2013/9/29 Garth N. Wells <[email protected]>
> > >> >
> > >> >> On 29 September 2013 17:12, Øyvind Evju <[email protected]> wrote:
> > >> >> > Wouldn't it be quite messy to suddenly have several vectors
> > >> >> > associated
> > >> >> > with
> > >> >> > a Function?
> > >> >> >
> > >> >>
> > >> >> No. It's very natural for a time-dependent Function.
> > >> >>
> > >> >> > Creating a hash of the mesh and finite element and storing cells,
> > >> >> > cell_dofs
> > >> >> > and x_cell_dofs there, we could keep the same structure for
> Functions
> > >> >> > as
> > >> >> > today with links (instead of actual data sets) within each
> Function
> > >> >> > to
> > >> >> > cells, cell_dofs and x_cell_dofs.
> > >> >> >
> > >> >> > When writing a Function a check is done to see if the cells,
> > >> >> > cell_dofs
> > >> >> > and
> > >> >> > x_cell_dofs exist under the relevant hash. If the hash (mesh,
> > >> >> > distribution
> > >> >> > or function space) changes, we need to write these data sets
> under
> > >> >> > the
> > >> >> > new
> > >> >> > hash.
> > >> >> >
> > >> >> > Have I misunderstood this hashing? It does seem to be very
> efficient,
> > >> >> > more
> > >> >> > efficient than rewriting those three datasets.
> > >> >> >
> > >> >>
> > >> >> Can you post a benchmark for testing the speed of hashing?
> > >> >>
> > >> >> Garth
> > >> >>
> > >> >> >
> > >> >> > -Øyvind
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > 2013/9/28 Chris Richardson <[email protected]>
> > >> >> >>
> > >> >> >> On 28/09/2013 13:29, Garth N. Wells wrote:
> > >> >> >>>
> > >> >> >>> On 28 September 2013 12:56, Chris Richardson <
> [email protected]>
> > >> >> >>> wrote:
> > >> >> >>>>
> > >> >> >>>> On 28/09/2013 11:31, Garth N. Wells wrote:
> > >> >> >>>>>
> > >> >> >>>>>
> > >> >> >>>>> On 28 September 2013 10:42, Chris Richardson
> > >> >> >>>>> <[email protected]>
> > >> >> >>>>> wrote:
> > >> >> >>>>>>
> > >> >> >>>>>>
> > >> >> >>>>>>
> > >> >> >>>>>> This is a continuation of the discussion at:
> > >> >> >>>>>>
> > >> >> >>>>>> https://bitbucket.org/fenics-project/dolfin/pull-request/52
> > >> >> >>>>>>
> > >> >> >>>>>> The question is how best to save a time-series of Function
> in
> > >> >> >>>>>> HDF5,
> > >> >> >>>>>> when
> > >> >> >>>>>> the
> > >> >> >>>>>> cell and dof layout remains constant.
> > >> >> >>>>>>
> > >> >> >>>>>> It has been suggested to use:
> > >> >> >>>>>>
> > >> >> >>>>>> u = Function(V)
> > >> >> >>>>>> h0 = HDF5File('Timeseries_of_Function.h5', 'w')
> > >> >> >>>>>> h0.write(u, '/Function')
> > >> >> >>>>>> # Then later
> > >> >> >>>>>> h0.write(u.vector(), "/Vector/0")
> > >> >> >>>>>> h0.write(u.vector(), "/Vector/1")
> > >> >> >>>>>>
> > >> >> >>>>>
> > >> >> >>>>> Shouldn't this be
> > >> >> >>>>>
> > >> >> >>>>> h0.write(u.vector(), "/Function/Vector/0")
> > >> >> >>>>> h0.write(u.vector(), "/Function/Vector/1")
> > >> >> >>>>>
> > >> >> >>>>
> > >> >> >>>> In the HDF5File model, the user is free to put vectors etc
> > >> >> >>>> wherever
> > >> >> >>>> they
> > >> >> >>>> want. There is no explicit meaning
> > >> >> >>>> to dumping extra vectors inside the "group" of a Function.
> > >> >> >>>>
> > >> >> >>>>
> > >> >> >>>>>
> > >> >> >>>>>> and to read back:
> > >> >> >>>>>>
> > >> >> >>>>>> u = Function(V)
> > >> >> >>>>>> h0 = HDF5File('Timeseries_of_Function.h5', 'r')
> > >> >> >>>>>> h0.read(u, "/Function")
> > >> >> >>>>>> h0.read(u.vector(), "/Function/vector")
> > >> >> >>>>>>
> > >> >> >>>>
> > >> >> >>>> OK, this probably should have been
> > >> >> >>>>
> > >> >> >>>> h0.read(u.vector(), "/Vector/1")
> > >> >> >>>>
> > >> >> >>>> When reading in a vector, it is just read directly, and
> > >> >> >>>> not reordered in any way. If the vector was saved from a
> different
> > >> >> >>>> set
> > >> >> >>>> of
> > >> >> >>>> processors, with different partitioning, the order could be
> quite
> > >> >> >>>> different.
> > >> >> >>>>
> > >> >> >>>> When reading a Function, the vector is reordered to take this
> into
> > >> >> >>>> account.
> > >> >> >>>>
> > >> >> >>>> If the vector is already associated with a Function (not all
> > >> >> >>>> vectors
> > >> >> >>>> are)
> > >> >> >>>> then it should be possible to reorder it when reading... maybe
> > >> >> >>>> that
> > >> >> >>>> should
> > >> >> >>>> be an option.
> > >> >> >>>>
> > >> >> >>>
> > >> >> >>> A solution seems very simple - use the HDF5 hierarchal
> structure to
> > >> >> >>> associate Vectors with a Function. This is the advantage of
> using a
> > >> >> >>> hierarchal storage format.
> > >> >> >>>
> > >> >> >>> If a user reads a Vector that is not already associated with a
> > >> >> >>>
> > >> >> >>> Function, then it should be the user's responsibility to take
> care
> > >> >> >>> of
> > >> >> >>> things.
> > >> >> >>>
> > >> >> >>
> > >> >> >> It could work like this:
> > >> >> >>
> > >> >> >> At present, when writing a Function, it creates a group and
> > >> >> >> populates
> > >> >> >> it
> > >> >> >> with
> > >> >> >> dofmap, cells, and vector. Writing again with the same name will
> > >> >> >> cause
> > >> >> >> an
> > >> >> >> error.
> > >> >> >> We could allow writes to the same name, but create more vectors
> > >> >> >> (maybe
> > >> >> >> checking that the cells/dofs are still compatible) in the same
> HDF5
> > >> >> >> group.
> > >> >> >> Or, a user could just manually dump more vectors in the group
> (as
> > >> >> >> described
> > >> >> >> above by Garth).
> > >> >> >>
> > >> >> >> For read, reading a Function will still behave the same, but we
> > >> >> >> could
> > >> >> >> have
> > >> >> >> the additional option of reading a Function by just giving the
> > >> >> >> vector
> > >> >> >> dataset name - and assuming that cell/dof information exists in
> the
> > >> >> >> same
> > >> >> >> HDF5 group. This should be fairly easy to implement.
> > >> >> >>
> > >> >> >>
> > >> >> >> Chris
> > >> >> >>
> > >> >> >> _______________________________________________
> > >> >> >> fenics mailing list
> > >> >> >> [email protected]
> > >> >> >> http://fenicsproject.org/mailman/listinfo/fenics
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > _______________________________________________
> > >> >> > fenics mailing list
> > >> >> > [email protected]
> > >> >> > http://fenicsproject.org/mailman/listinfo/fenics
> > >> >> >
> > >> >
> > >> >
> > >> >
> > >> > _______________________________________________
> > >> > fenics mailing list
> > >> > [email protected]
> > >> > http://fenicsproject.org/mailman/listinfo/fenics
> > >> >
> > >
> > >
> > _______________________________________________
> > fenics mailing list
> > [email protected]
> > http://fenicsproject.org/mailman/listinfo/fenics
> _______________________________________________
> fenics mailing list
> [email protected]
> http://fenicsproject.org/mailman/listinfo/fenics
>
_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics