Re: [Hdf-forum] Performance query

Suman Vajjala Mon, 08 Apr 2013 22:48:27 -0700

Thanks a lot Mark. Collective communication may not always be possible as
two processors may access to the same data. Are you suggesting that one
needs to separate I/O requests into collective and independent modes? Can
one do a collective I/O even when the data is non-contiguous?


Regards
Suman


On Tue, Apr 9, 2013 at 11:06 AM, Miller, Mark C. <[email protected]> wrote:

>  Hi Suman,
>
>  I think you are hinting at the fact that unstructured data is harder to
> manage with true concurrent parallel I/O to a single, shared file, right?
> Yeah, it is.
>
>  Honestly, I don't have much experience with that. My impression has
> always been that the more information you can give HDF5 in the
> H5Dwrite/H5Dread calls, the better it and the layers below it (MPI-IO,
> luster/gpfs, etc.) can take advantage of things and 'do the right/best
> thing'.
>
>  So, I guess one piece of advice is to try to characterize the
> application's I/O needs in requests that are 'as large as possible'. And,
> if its practical to do so, try to allow processors to 'coordinate' their
> I/O by issuing collective instead of independent requests. If parts of some
> large array are being read onto many processors, let each processor define
> the its piece(s) of that array using HDF5 data selections and then have all
> of them do a collective H5Dread. In theory, that lets MPI-IO and lustre do
> whatever 'magic' is possible to aggregate all the data into a few I/O
> requests to the filesystem but buffer and disperse it to whatever
> processors need it efficiently. At least that is how I think
> HDF5/MPI-IO/Luster will operate in theory. There may be a lot of issues to
> making it efficient in practice.
>
>  Good luck.
>
>  Mark
>
>  --
>  Mark C. Miller, Lawrence Livermore National Laboratory
> ================!!LLNL BUSINESS ONLY!!================
> [email protected]      urgent: [email protected]
> T:8-6 (925)-423-5901    M/W/Th:7-12,2-7 (530)-753-8511
>
>   From: Suman Vajjala <[email protected]>
> Date: Monday, April 8, 2013 10:25 PM
> To: HDF Users Discussion List <[email protected]>, "Miller, Mark C."
> <[email protected]>
> Subject: Re: [Hdf-forum] Performance query
>
>     Hi Mark,
>
>       Thank you for answering my question. You are right about the
> practicality issue. This isn't a serious issue if the data is stored in a
> distributed sense with some indicators specifying which processor has the
> data. This data is treated as a background database and it doesn't move
> during the transfer process.
>
>       The reason I asked this question is that I've seen HDF5 taking quite
> sometime to read data in an unstructured manner while the MPI calls just
> zip through. On an another note, are there any guidelines for parallel I/O
> in case of unstructured data?
>
>  Regards
>  Suman
>
>
> On Tue, Apr 9, 2013 at 10:39 AM, Miller, Mark C. <[email protected]>wrote:
>
>>   Hmm. If I understand the question, I really cannot imagine a scenario
>> where parallel I/O would be "faster" than MPI send/recv calls.
>>
>>  However, there may be a practicality issue here. It may be the case
>> that the data processor k needs is on processor j but processor k doesn't
>> know that processor j has it and processor j doesn't know that processor k
>> needs it. So, there has to be some communication to for the processors to
>> learn that.
>>
>>  And, if those processors are 'somewhere else' in their execution, then
>> you have a significant issue in programming to take advantage of the fact
>> that you could use MPI send/recv to move the  data anyways. In the end, it
>> just might be more practical to just read data from the file, even if it is
>> quite a bit slower.
>>
>>  I am not sure I answered the question you asked though ;)
>>
>>  Mark
>>
>>  --
>>  Mark C. Miller, Lawrence Livermore National Laboratory
>> ================!!LLNL BUSINESS ONLY!!================
>> [email protected]      urgent: [email protected]
>> T:8-6 (925)-423-5901    M/W/Th:7-12,2-7 (530)-753-8511
>>
>>   From: Suman Vajjala <[email protected]>
>> Reply-To: HDF Users Discussion List <[email protected]>
>> Date: Monday, April 8, 2013 9:38 PM
>> To: HDF Users Discussion List <[email protected]>
>> Subject: [Hdf-forum] Performance query
>>
>>    Hi,
>>
>>       I have a question regarding the performance of parallel I/O vs MPI
>> communication based calls. I have data which needs to be accessed by
>> different processors. If the data is in memory then MPI calls (Send/Recv)
>> does the job. In an another scenario the data is written to a H5 file and
>> different processors access the respective data using parallel I/O. Would
>> MPI calls be faster than HDF5 parallel I/O? (data access could be
>> unstructured)
>>
>>  Regards
>> Suman Vajjala
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>>
>

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] Performance query

Reply via email to