Have you tried using a benchmark like IOR to stress the NFS file
system? Maybe it is a problem with NFS and not the underlying file
system or HDF5. Mark

On Tue, Oct 26, 2010 at 7:39 PM, Dave Wade-Stein <[email protected]> wrote:
> As to MPI, we're both using openmpi 1.4.1.
>
> We're both using NFS file systems which are formatted as xfs. As I mentioned, 
> we had problems with ext3 filesystems, which were alleviated when we 
> reformatted as xfs. Unfortunately, that didn't work for the customer.
>
> Thanks,
> Dave
>
> On Oct 26, 2010, at 5:36 PM, Mark Howison wrote:
>
>> I guess it could depend on the MPI library, but most likely not. What
>> parallel file system is used on the customer's machine? Mark
>>
>> On Tue, Oct 26, 2010 at 7:25 PM, Dave Wade-Stein <[email protected]> wrote:
>>> Mark,
>>>
>>> The same code hangs on the customer machine, but works fine on our 
>>> clusters. Would that be possible if some subset aren't participating in the 
>>> I/O?
>>>
>>> Thanks,
>>> Dave
>>>
>>> On Oct 26, 2010, at 5:14 PM, Mark Howison wrote:
>>>
>>>> Hi Dave,
>>>>
>>>> One common hang with collective-mode parallel I/O in HDF5 is when only
>>>> a subset of processes are participating in the I/O, but the other
>>>> processes haven't made an empty selection (to say that they are not
>>>> participating) using H5Sselect_none(). Also, have you tried
>>>> experimenting with collective vs. independent mode?
>>>>
>>>> Mark
>>>>
>>>> On Tue, Oct 26, 2010 at 6:52 PM, Dave Wade-Stein <[email protected]> wrote:
>>>>> We use hdf5 for parallel I/O in VORPAL, our laser plasma simulation code. 
>>>>> For the most part, it works fine, but on certain machines (e.g., early 
>>>>> Cray and BG/P) and certain types of filesystems, we've noticed that 
>>>>> parallel I/O hangs, so we instituted a -id (individual dump) option which 
>>>>> causes each MPI rank to dump its own hdf5 file, and once the simulation 
>>>>> is complete, we merge the individual dump files.
>>>>>
>>>>> We have a customer for whom parallel I/O is hanging, and they are using 
>>>>> -id as described above. We're trying to pinpoint why parallel I/O is not 
>>>>> working on their system, which is CentOS 5.5 cluster.
>>>>>
>>>>> In the past we ourselves have had problems with parallel I/O failing on 
>>>>> ext3 filesystems, so we reformatted as XFS and the problem went away. Our 
>>>>> customer did this, but the problem still persists.
>>>>>
>>>>> Anyone have any words of wisdom as to what other things could cause 
>>>>> parallel I/O to hang?
>>>>>
>>>>> Thanks for any help!
>>>>> Dave
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> [email protected]
>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to