Re: [Hdf-forum] Performance query

Miller, Mark C. Mon, 08 Apr 2013 22:36:39 -0700

Hi Suman,

I think you are hinting at the fact that unstructured data is harder to manage 
with true concurrent parallel I/O to a single, shared file, right? Yeah, it is.


Honestly, I don't have much experience with that. My impression has always been 
that the more information you can give HDF5 in the H5Dwrite/H5Dread calls, the 
better it and the layers below it (MPI-IO, luster/gpfs, etc.) can take 
advantage of things and 'do the right/best thing'.

So, I guess one piece of advice is to try to characterize the application's I/O 
needs in requests that are 'as large as possible'. And, if its practical to do 
so, try to allow processors to 'coordinate' their I/O by issuing collective 
instead of independent requests. If parts of some large array are being read 
onto many processors, let each processor define the its piece(s) of that array 
using HDF5 data selections and then have all of them do a collective H5Dread. 
In theory, that lets MPI-IO and lustre do whatever 'magic' is possible to 
aggregate all the data into a few I/O requests to the filesystem but buffer and 
disperse it to whatever processors need it efficiently. At least that is how I 
think HDF5/MPI-IO/Luster will operate in theory. There may be a lot of issues 
to making it efficient in practice.

Good luck.

Mark

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
mille...@llnl.gov<mailto:mille...@llnl.gov>      urgent: markcmille...@gmail.com
T:8-6 (925)-423-5901    M/W/Th:7-12,2-7 (530)-753-8511

From: Suman Vajjala <suman.g...@gmail.com<mailto:suman.g...@gmail.com>>
Date: Monday, April 8, 2013 10:25 PM
To: HDF Users Discussion List 
<hdf-forum@hdfgroup.org<mailto:hdf-forum@hdfgroup.org>>, "Miller, Mark C." 
<mille...@llnl.gov<mailto:mille...@llnl.gov>>
Subject: Re: [Hdf-forum] Performance query

Hi Mark,

     Thank you for answering my question. You are right about the practicality 
issue. This isn't a serious issue if the data is stored in a distributed sense 
with some indicators specifying which processor has the data. This data is 
treated as a background database and it doesn't move during the transfer 
process.

     The reason I asked this question is that I've seen HDF5 taking quite 
sometime to read data in an unstructured manner while the MPI calls just zip 
through. On an another note, are there any guidelines for parallel I/O in case 
of unstructured data?

Regards
Suman


On Tue, Apr 9, 2013 at 10:39 AM, Miller, Mark C. 
<mille...@llnl.gov<mailto:mille...@llnl.gov>> wrote:
Hmm. If I understand the question, I really cannot imagine a scenario where 
parallel I/O would be "faster" than MPI send/recv calls.

However, there may be a practicality issue here. It may be the case that the 
data processor k needs is on processor j but processor k doesn't know that 
processor j has it and processor j doesn't know that processor k needs it. So, 
there has to be some communication to for the processors to learn that.

And, if those processors are 'somewhere else' in their execution, then you have 
a significant issue in programming to take advantage of the fact that you could 
use MPI send/recv to move the  data anyways. In the end, it just might be more 
practical to just read data from the file, even if it is quite a bit slower.

I am not sure I answered the question you asked though ;)

Mark

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
mille...@llnl.gov<mailto:mille...@llnl.gov>      urgent: 
markcmille...@gmail.com<mailto:markcmille...@gmail.com>
T:8-6 (925)-423-5901    M/W/Th:7-12,2-7 (530)-753-8511

From: Suman Vajjala <suman.g...@gmail.com<mailto:suman.g...@gmail.com>>
Reply-To: HDF Users Discussion List 
<hdf-forum@hdfgroup.org<mailto:hdf-forum@hdfgroup.org>>
Date: Monday, April 8, 2013 9:38 PM
To: HDF Users Discussion List 
<hdf-forum@hdfgroup.org<mailto:hdf-forum@hdfgroup.org>>
Subject: [Hdf-forum] Performance query

Hi,

     I have a question regarding the performance of parallel I/O vs MPI 
communication based calls. I have data which needs to be accessed by different 
processors. If the data is in memory then MPI calls (Send/Recv) does the job. 
In an another scenario the data is written to a H5 file and different 
processors access the respective data using parallel I/O. Would MPI calls be 
faster than HDF5 parallel I/O? (data access could be unstructured)

Regards
Suman Vajjala

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org<mailto:Hdf-forum@hdfgroup.org>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] Performance query

Reply via email to