Hi Leigh,

Just to echo Quincey here, you will see optimal performance when the
chunk dimensions evenly divides the dataset dimensions, so that there
are no "partial" chunks. Although, there was some work done recently
to the HDF5 library to better detect chunk decomposition. Quincey can
speak more to that.

So the ideal chunk dimensions would be ones that evenly divide the
dataset and are close to a multiple of 1MB in terms of total data (so
that they need minimal padding when aligning to lustre stripe width).

Mark

On Mon, Mar 7, 2011 at 10:56 PM, Quincey Koziol <[email protected]> wrote:
> Hi Leigh,
>
> On Mar 7, 2011, at 3:01 PM, Leigh Orf wrote:
>
>> On Mon, Mar 7, 2011 at 9:16 AM, Quincey Koziol <[email protected]> wrote:
>>> Hi Leigh,
>>>
>>>>
>>>> Chunk in Z only, so my chunk dimensions would be something like
>>>> 28x21x30 (it's never been clear to me what chunk size to pick to
>>>> optimize I/O).
>>>>
>>>> And keep the other parameters the same (1 stripe, and 3,000 files per
>>>> directory).
>>>>
>>>> I guess what I'm mostly looking for is assurance that I will get
>>>> faster I/O going down this kind of route than the current way I am
>>>> doing unformatted I/O.
>>>
>>>        This looks like a fruitful direction to go it.  Do you really need 
>>> chunking though?
>>
>> Not sure, It's never been super clear to me what chunking gets you
>> beyond (1) the ability to do compression (2) faster seeking through
>> large datasets when you want to access space towards the end of the
>> file. I may just forego chunking and see where that gets me first.
>
>        Chunking is required if you want to have unlimited dimensions on your 
> dataset's dataspace.  I would rephrase (2) above as "faster I/O when your 
> selection is a good match for the chunk size", which could be an exact match 
> for the chunk size, or a selection with a well-aligned, good multiple or 
> fraction of the chunk size.  If you aren't using compression, don't need 
> unlimited dimensions and aren't performing I/O on selections of the dataset, 
> contiguous storage is probably a better fit.
>
>        Quincey
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to