Re: [Pytables-users] Speed of in-kernel Full-Table Search

2013-06-25 Thread Wagner Sebastian
Hi Anthony and Antonio,

Thanks for your fast responses. It's great to hear all features are now free to 
use, though I needed one and a half week to get this.

The first reference I read to learn the usage of PyTables was Hints for SQL 
Users [1], where is stated several times, for example in the section ' Creating 
an index':
 Indexing is supported in the commercial version of PyTables (PyTablesPro).
I would suggest that these texts should be updated.
Being convinced it's only available in Pro-Version after I read it so often, I 
also overread the warning in the PyTables Pro page[2] (As I were only 
interested in the features not available in the free version I just scrolled 
down immediately, diagonal reading...). So the next suggestion is to give a 
color to the warning text there :)

[1]
http://www.pytables.org/moin/HintsForSQLUsers#Creatinganindex
http://www.pytables.org/moin/HintsForSQLUsers#Selectingdata
[2]
http://www.pytables.org/moin/PyTablesPro

regards,
Sebastian

On Mon, Jun 24, 2013 at 4:25 AM, Wagner Sebastian  
sebastian.wagner...@ait.ac.at wrote:

  Dear PyTables-Users,

 ** **

 For testing purposes I use a PyTables DB with 4 columns (1x Uint8 and
 3xFloat) with 750k rows, the total file size about 90MB. As the free 
 version does no support indexing I thought that a search (full-table) 
 on this database would last a least one or two seconds, because the 
 file has to be loaded first (throttleneck I/O), and then the search 
 over ~20k rows can begin. But PyTables took only 0.05 seconds for a 
 full table search (in-kernel, so near C-speed, but nevertheless full 
 table), while my bisecting algorithm with a precomputed sorted list 
 wrapped around PyTables (but saved in there), took about 0.5 
 seconds.

 ** **

 So the thing I don?t understand: How can PyTables be so fast without 
 any Indexing?


Hi Sebastian,

First, there is no longer a non-free version of PyTables and v3.0 *does* have 
indexing capabilities.  However, you have to enable them so you probably 
weren't using them.

PyTables is fast because HDF5 is a binary format, it using pthreads under the 
covers to parallelize some tasks, and it uses numexpr (which is also
parallel) to evaluate many expressions.  All of these things help make PyTables 
great!

Be Well
Anthony


Il 24/06/2013 11:25, Wagner Sebastian ha scritto:
 Dear PyTables-Users,
 
 For testing purposes I use a PyTables DB with 4 columns (1x Uint8 and 
 3xFloat) with 750k rows, the total file size about 90MB. As the free version 
 does no support indexing I thought that a search (full-table) on this 
 database would last a least one or two seconds, because the file has to be 
 loaded first (throttleneck I/O), and then the search over ~20k rows can 
 begin. But PyTables took only 0.05 seconds for a full table search 
 (in-kernel, so near C-speed, but nevertheless full table), while my bisecting 
 algorithm with a precomputed sorted list wrapped around PyTables (but saved 
 in there), took about 0.5 seconds.
 
 So the thing I don't understand: How can PyTables be so fast without any 
 Indexing?
 
 I'm using 3.0.0rc2 coming with WinPython
 
 Regards,
 Sebastian

The indexing features of PyTables Pro are now available in the open source 
version of PyTables since version 2.3 (please see [1]).



[1]
http://pytables.github.io/release-notes/RELEASE_NOTES_v2.3.x.html#changes-from-2-2-1-to-2-3

ciao

--
Antonio Valentino

--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


[Pytables-users] writing metadata

2013-06-25 Thread Andre' Walker-Loud
Dear PyTables users,

I am trying to figure out the best way to write some metadata into some files I 
have.

The hdf5 file looks like

/root/data_1/stat
/root/data_1/sys

where stat and sys are Arrays containing statistical and systematic 
fluctuations of numerical fits to some data I have.  What I would like to do is 
add another object

/root/data_1/fit

where fit is just a metadata key that describes all the choices I made in 
performing the fit, such as seed for the random number generator, and many 
choices for fitting options, like initial guess values of parameters, fitting 
range, etc.

I began to follow the example in the PyTables manual, in Section 1.2 The 
Object Tree, where first a class is defined 

class Particle(tables.IsDescription):
identity = tables.StringCol(itemsize=22, dflt= , pos=0)
...

and then this class is used to populate a table.

In my case, I won't have a table, but really just want a single object 
containing my metadata.  I am wondering if there is a recommended way to do 
this?  The Table does not seem optimal, but I don't see what else I would use.


Thanks,

Andre



--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] writing metadata

2013-06-25 Thread Josh Ayers
Another option is to create a Python object - dict, list, or whatever works
- containing the metadata and then store a pickled version of it in a
PyTables array.  It's nice for this sort of thing because you have the full
flexibility of Python's data containers.

For example, if the Python object is called 'fit', then
numpy.frombuffer(pickle.dumps(fit), 'u1') will pickle it and convert the
result to a NumPy array of unsigned bytes.  It can be stored in a PyTables
array using a UInt8Atom.  To retrieve the Python object, just use
pickle.loads(hdf5_file.root.data_1.fit[:]).

It gets a little more complicated if you want to be able to modify the
Python object, because the length of the pickle will change.  In that case,
you can use an EArray (for the case when the pickle grows), and store the
number of bytes as an attribute.  Storing the number of bytes handles the
case when the pickle shrinks and doesn't use the full length of the on-disk
array.  To load it, use
pickle.loads(hdf5_file.root.data_1.fit[:num_bytes]), where num_bytes is the
previously stored attribute.  To modify it, just overwrite the array with
the new version, expanding if necessary, then update the num_bytes
attribute.

Using a PyTables VLArray with an 'object' atom uses a similar technique
under the hood, so that may be easier.  It doesn't allow resizing though.

Hope that helps,
Josh



On Tue, Jun 25, 2013 at 1:33 AM, Andreas Hilboll li...@hilboll.de wrote:

 On 25.06.2013 10:26, Andre' Walker-Loud wrote:
  Dear PyTables users,
 
  I am trying to figure out the best way to write some metadata into some
 files I have.
 
  The hdf5 file looks like
 
  /root/data_1/stat
  /root/data_1/sys
 
  where stat and sys are Arrays containing statistical and systematic
 fluctuations of numerical fits to some data I have.  What I would like to
 do is add another object
 
  /root/data_1/fit
 
  where fit is just a metadata key that describes all the choices I made
 in performing the fit, such as seed for the random number generator, and
 many choices for fitting options, like initial guess values of parameters,
 fitting range, etc.
 
  I began to follow the example in the PyTables manual, in Section 1.2
 The Object Tree, where first a class is defined
 
  class Particle(tables.IsDescription):
identity = tables.StringCol(itemsize=22, dflt= , pos=0)
...
 
  and then this class is used to populate a table.
 
  In my case, I won't have a table, but really just want a single object
 containing my metadata.  I am wondering if there is a recommended way to do
 this?  The Table does not seem optimal, but I don't see what else I would
 use.

 For complex information I'd probably indeed use a table object. It
 doesn't matter if the table only has one row, but still you have all the
 information there nicely structured.

 -- Andreas.



 --
 This SF.net email is sponsored by Windows:

 Build for Windows Store.

 http://p.sf.net/sfu/windows-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] writing metadata

2013-06-25 Thread Anthony Scopatz
Also, depending on how much meta data you really needed to store you could
just use attributes.  That is what they are there for.


On Tue, Jun 25, 2013 at 10:06 AM, Josh Ayers josh.ay...@gmail.com wrote:

 Another option is to create a Python object - dict, list, or whatever
 works - containing the metadata and then store a pickled version of it in a
 PyTables array.  It's nice for this sort of thing because you have the full
 flexibility of Python's data containers.

 For example, if the Python object is called 'fit', then
 numpy.frombuffer(pickle.dumps(fit), 'u1') will pickle it and convert the
 result to a NumPy array of unsigned bytes.  It can be stored in a PyTables
 array using a UInt8Atom.  To retrieve the Python object, just use
 pickle.loads(hdf5_file.root.data_1.fit[:]).

 It gets a little more complicated if you want to be able to modify the
 Python object, because the length of the pickle will change.  In that case,
 you can use an EArray (for the case when the pickle grows), and store the
 number of bytes as an attribute.  Storing the number of bytes handles the
 case when the pickle shrinks and doesn't use the full length of the on-disk
 array.  To load it, use
 pickle.loads(hdf5_file.root.data_1.fit[:num_bytes]), where num_bytes is the
 previously stored attribute.  To modify it, just overwrite the array with
 the new version, expanding if necessary, then update the num_bytes
 attribute.

 Using a PyTables VLArray with an 'object' atom uses a similar technique
 under the hood, so that may be easier.  It doesn't allow resizing though.

 Hope that helps,
 Josh



 On Tue, Jun 25, 2013 at 1:33 AM, Andreas Hilboll li...@hilboll.de wrote:

 On 25.06.2013 10:26, Andre' Walker-Loud wrote:
  Dear PyTables users,
 
  I am trying to figure out the best way to write some metadata into some
 files I have.
 
  The hdf5 file looks like
 
  /root/data_1/stat
  /root/data_1/sys
 
  where stat and sys are Arrays containing statistical and systematic
 fluctuations of numerical fits to some data I have.  What I would like to
 do is add another object
 
  /root/data_1/fit
 
  where fit is just a metadata key that describes all the choices I
 made in performing the fit, such as seed for the random number generator,
 and many choices for fitting options, like initial guess values of
 parameters, fitting range, etc.
 
  I began to follow the example in the PyTables manual, in Section 1.2
 The Object Tree, where first a class is defined
 
  class Particle(tables.IsDescription):
identity = tables.StringCol(itemsize=22, dflt= , pos=0)
...
 
  and then this class is used to populate a table.
 
  In my case, I won't have a table, but really just want a single object
 containing my metadata.  I am wondering if there is a recommended way to do
 this?  The Table does not seem optimal, but I don't see what else I would
 use.

 For complex information I'd probably indeed use a table object. It
 doesn't matter if the table only has one row, but still you have all the
 information there nicely structured.

 -- Andreas.



 --
 This SF.net email is sponsored by Windows:

 Build for Windows Store.

 http://p.sf.net/sfu/windows-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 This SF.net email is sponsored by Windows:

 Build for Windows Store.

 http://p.sf.net/sfu/windows-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Speed of in-kernel Full-Table Search

2013-06-25 Thread Antonio Valentino
Hi Sebastian,

Il 25/06/2013 09:36, Wagner Sebastian ha scritto:
 Hi Anthony and Antonio,
 
 Thanks for your fast responses. It's great to hear all features are now free 
 to use, though I needed one and a half week to get this.
 
 The first reference I read to learn the usage of PyTables was Hints for SQL 
 Users [1], where is stated several times, for example in the section ' 
 Creating an index':
 Indexing is supported in the commercial version of PyTables (PyTablesPro).
 I would suggest that these texts should be updated.
 Being convinced it's only available in Pro-Version after I read it so often, 
 I also overread the warning in the PyTables Pro page[2] (As I were only 
 interested in the features not available in the free version I just scrolled 
 down immediately, diagonal reading...). So the next suggestion is to give a 
 color to the warning text there :)
 
 [1]
 http://www.pytables.org/moin/HintsForSQLUsers#Creatinganindex
 http://www.pytables.org/moin/HintsForSQLUsers#Selectingdata
 [2]
 http://www.pytables.org/moin/PyTablesPro
 
 regards,
 Sebastian
 

thank you for reporting the issue, I will fix it ASAP.
The same problem also affect the corresponding cookbook page [1].

Anyway, please, feel free to update the wiki if you find outdated material.


[1] http://pytables.github.io/cookbook/hints_for_sql_users.html

 On Mon, Jun 24, 2013 at 4:25 AM, Wagner Sebastian  
 sebastian.wagner...@ait.ac.at wrote:
 
  Dear PyTables-Users,

 ** **

 For testing purposes I use a PyTables DB with 4 columns (1x Uint8 and
 3xFloat) with 750k rows, the total file size about 90MB. As the free 
 version does no support indexing I thought that a search (full-table) 
 on this database would last a least one or two seconds, because the 
 file has to be loaded first (throttleneck I/O), and then the search 
 over ~20k rows can begin. But PyTables took only 0.05 seconds for a 
 full table search (in-kernel, so near C-speed, but nevertheless full 
 table), while my bisecting algorithm with a precomputed sorted list 
 wrapped around PyTables (but saved in there), took about 0.5 
 seconds.

 ** **

 So the thing I don?t understand: How can PyTables be so fast without 
 any Indexing?

 
 Hi Sebastian,
 
 First, there is no longer a non-free version of PyTables and v3.0 *does* have 
 indexing capabilities.  However, you have to enable them so you probably 
 weren't using them.
 
 PyTables is fast because HDF5 is a binary format, it using pthreads under the 
 covers to parallelize some tasks, and it uses numexpr (which is also
 parallel) to evaluate many expressions.  All of these things help make 
 PyTables great!
 
 Be Well
 Anthony
 
 
 Il 24/06/2013 11:25, Wagner Sebastian ha scritto:
 Dear PyTables-Users,

 For testing purposes I use a PyTables DB with 4 columns (1x Uint8 and 
 3xFloat) with 750k rows, the total file size about 90MB. As the free version 
 does no support indexing I thought that a search (full-table) on this 
 database would last a least one or two seconds, because the file has to be 
 loaded first (throttleneck I/O), and then the search over ~20k rows can 
 begin. But PyTables took only 0.05 seconds for a full table search 
 (in-kernel, so near C-speed, but nevertheless full table), while my 
 bisecting algorithm with a precomputed sorted list wrapped around PyTables 
 (but saved in there), took about 0.5 seconds.

 So the thing I don't understand: How can PyTables be so fast without any 
 Indexing?

 I'm using 3.0.0rc2 coming with WinPython

 Regards,
 Sebastian
 
 The indexing features of PyTables Pro are now available in the open source 
 version of PyTables since version 2.3 (please see [1]).
 
 
 
 [1]
 http://pytables.github.io/release-notes/RELEASE_NOTES_v2.3.x.html#changes-from-2-2-1-to-2-3
 
 ciao
 
 --
 Antonio Valentino
 


-- 
Antonio Valentino

--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] writing metadata

2013-06-25 Thread Andre' Walker-Loud
Hi Andreas, Josh, Anthony and Antonio,

Thanks for your help.


Andre





On Jun 26, 2013, at 2:48 AM, Antonio Valentino wrote:

 Hi Andre',
 
 Il 25/06/2013 10:26, Andre' Walker-Loud ha scritto:
 Dear PyTables users,
 
 I am trying to figure out the best way to write some metadata into some 
 files I have.
 
 The hdf5 file looks like
 
 /root/data_1/stat
 /root/data_1/sys
 
 where stat and sys are Arrays containing statistical and systematic 
 fluctuations of numerical fits to some data I have.  What I would like to do 
 is add another object
 
 /root/data_1/fit
 
 where fit is just a metadata key that describes all the choices I made in 
 performing the fit, such as seed for the random number generator, and many 
 choices for fitting options, like initial guess values of parameters, 
 fitting range, etc.
 
 I began to follow the example in the PyTables manual, in Section 1.2 The 
 Object Tree, where first a class is defined 
 
 class Particle(tables.IsDescription):
  identity = tables.StringCol(itemsize=22, dflt= , pos=0)
  ...
 
 and then this class is used to populate a table.
 
 In my case, I won't have a table, but really just want a single object 
 containing my metadata.  I am wondering if there is a recommended way to do 
 this?  The Table does not seem optimal, but I don't see what else I would 
 use.
 
 
 Thanks,
 
 Andre
 
 
 For leaf nodes (Tables, Array, ets) you can use the attrs attribute
 set [1] as described in [2].
 For group objects (like e.g. root) you can use the set_node_attr
 method [3] of File objects or _v_attrs.
 
 
 cheers
 
 [1]
 http://pytables.github.io/usersguide/libref/declarative_classes.html#attributesetclassdescr
 [2]
 http://pytables.github.io/usersguide/tutorials.html#setting-and-getting-user-attributes
 [3]
 http://pytables.github.io/usersguide/libref/file_class.html#tables.File.set_node_attr
 
 
 -- 
 Antonio Valentino
 
 --
 This SF.net email is sponsored by Windows:
 
 Build for Windows Store.
 
 http://p.sf.net/sfu/windows-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users