Another option is to create a Python object - dict, list, or whatever works
- containing the metadata and then store a pickled version of it in a
PyTables array. It's nice for this sort of thing because you have the full
flexibility of Python's data containers.
For example, if the Python object
David,
The change in issue 27 was only for iteration over a tables.Column
instance. To use it, tweak Anthony's code as follows. This will iterate
over the element column, as in your original example.
Note also that this will only work with the development version of PyTables
available on
: Nested Iteration of HDF5 using PyTables (Josh Ayers)
--
Message: 1
Date: Thu, 3 Jan 2013 10:29:33 -0800
From: Josh Ayers josh.ay...@gmail.com
Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables
Jennifer,
When adding a Python object to a VLArray, PyTables first pickles the
object. It looks like you're trying to add something that can't be
pickled. Check the type of the 'state' variable in the first line of the
stack trace and make sure it's something that can be pickled. See [1] for
copying would be needed since their memory is shared,
which should make it faster than the multi-process techniques.
Hope that helps.
Josh Ayers
[1]: http://www.hdfgroup.org/hdf5-quest.html#mthread
[2]:
https://visitbugs.ornl.gov/projects/8/wiki/Multi-threaded_cores_and_HPC-HDF5
[3]:
https
Depending on your use case, you may be able to get around this by storing
each column in its own table. That will effectively store the data in
column-first order. Instead of creating a table, you would create a group,
which then contains a separate table for each column.
If you want, you can
My first instinct would be to handle all access (read and write) to
that file from a single process. You could create two
multiprocessing.Queue objects, one for data to write and one for read
requests. Then the process would check the queues in a loop and
handle each request serially. The data
Here's an alternative method that uses the built-in search capabilities in
PyTables in place of the itertools library.
Using readWhere as shown below will return a NumPy ndarray of the data that
matches the query. I think that answers your question #4. There are
similar methods - where and
it should be no greater than O(n).
The strange thing is that my iter0() is really fast but all other
versions are really slow. Maybe iter0() is only reading the fields I
access whereas the other versions read the whole records into memory.
Thanks,
Geoffrey
On Wed, Jun 29, 2011 at 9:51 AM, Josh
Tables have a similar inconsistent behavior, which I've had to work around
in a few places as well. See the following example code, which is very
similar to Mario's. Slice1 is of type numpy.void, while slice2 is of
type numpy.ndarray.
h = tables.openFile('test.h5',mode='w')
dtype =
on my machine. I also copied the error message.
Any ideas on a cause and a solution? Is there a hard limit on the maximum
number of columns in a table?
Thanks for your help,
Josh Ayers
tables.test() output:
PyTables version: 2.2
HDF5 version: 1.8.5
NumPy version: 1.5.0b2
Numexpr
Here's a simpler code snippet to reproduce the error. It appears there is a
maximum number of columns in a table, and it depends on the data type in an
unusual way (at least to me). All floats have one limit and all integers
have another limit, regardless of the bit size. I didn't test strings
it's a bug in PyTables.
Thanks,
Josh Ayers
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
PyTables version: 2.2
HDF5 version: 1.8.5
NumPy version: 1.5.0b2
Numexpr version: 1.4 (not using Intel's VML/MKL)
Zlib version: 1.2.3 (in Python interpreter
13 matches
Mail list logo