Re: [Pytables-users] Chunk selection for optimized data access

Andreas Hilboll Wed, 05 Jun 2013 02:47:35 -0700

On 05.06.2013 10:31, Andreas Hilboll wrote:
> On 05.06.2013 03:29, Tim Burgess wrote:
>> I was playing around with in-memory HDF5 prior to the 3.0 release.
>> Here's an example based on what I was doing.
>> I looked over the docs and it does mention that there is an option to
>> throw away the 'file' rather than write it to disk.
>> Not sure how to do that and can't actually think of a use case where I
>> would want to :-)
>>
>> And be wary, it is H5FD_CORE.
>>
>>
>> On Jun 05, 2013, at 08:38 AM, Anthony Scopatz <scop...@gmail.com> wrote:
>>>
>>> I think that you want to set parameters.DRIVER to H5DF_CORE [1].  I
>>> haven't ever used this personally, but it would be great to have an
>>> example script, if someone wants to write one ;)
>>>
>>  
>>
>> import numpy as np
>> import tables
>>
>> CHUNKY = 30 
>> CHUNKX = 8640
>>
>> if __name__ == '__main__':
>>
>>     # create dataset and add global attrs
>>
>>     file_path = 'demofile_chunk%sx%d.h5' % (CHUNKY, CHUNKX)
>>
>>     with tables.open_file(file_path, 'w', title='PyTables HDF5 In-memory
>> example', driver='H5FD_CORE') as h5f:
>>         
>>         # dummy some data
>>         lats = np.empty([4320])
>>         lons = np.empty([8640])
>>
>>         # create some simple arrays
>>         lat_node = h5f.create_array('/', 'lat', lats, title='latitude')
>>         lon_node = h5f.create_array('/', 'lon', lons, title='longitude')
>>
>>         # create a 365 x 4320 x 8640 CArray of 32bit float
>>         shape = (365, 4320, 8640)
>>         atom = tables.Float32Atom(dflt=np.nan)
>>
>>         # chunk into daily slices and then further chunk days
>>         sst_node = h5f.create_carray(h5f.root, 'sst', atom, shape,
>> chunkshape=(1, CHUNKY, CHUNKX))
>>
>>         # dummy up an ndarray
>>         sst = np.empty([4320, 8640], dtype=np.float32)
>>         sst.fill(30.0)
>>
>>         # write ndarray to a 2D plane in the HDF5
>>         sst_node[0] = sst
> 
> Thanks Tim,
> 
> I adapted your example for my use case (I'm using the EArray class,
> because I need to continuously update my database), and it works well.
> 
> However, when I use this with my own data (but also creating the arrays
> like you did), I'm running into errors like "Could not wait on barrier".
> It seems like the HDF library is spawing several threads.
> 
> Any idea what's going wrong? Can I somehow avoid HDF5 multithreading at
> runtime?


Update:

When setting max_blosc_threads=2 and max_numexpr_threads=2, everything
seems to work as expected (but a bit on the slow side ...). With
max_blosc_threads=4, the error pops up.

Cheers, Andreas.


------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Chunk selection for optimized data access

Reply via email to