[Pytables-users] searching for group names

2013-08-05 Thread Nyirő Gergő
Hello,


We develop a measurement evaluation tool, and we'd like to use
pytables/hdf5 as a middle layer for signal accessing.

We have to deal with the silly structure of the recorder device
measurement format.



The signals can be accessed via two identifiers:

* device name: source of the signal-channel of the
message-another tag-yet another tag

* signal name



The first identifier says the source information of the signal, which
can be quite long.

Therefore I grouped the device name into two layers:

/source of the signal

/channel of the message...

/signal name



So if you have the same message from two channels, than you will get
/foo-device-name

/channel-1

/bar

/baz

/channel-2

/bar

/baz



Besides signal loading, we have to search for signal name as fast as
possible, and return with the shortest unique device name part and the
signal name.

Using the structure above, iterating over the group names is quite
slow. So I build up a table from device and signal name.

As far as I know, the pytables query does not support string searching
(e.g. startswidth, *foo[0-9]ch*, etc.), so fetching this table lead us
to a pure python loop which is slow again.

Therefore I build up a python dictionary from the table, which provide
fast iteration against the table, but the init time increased from 100
ms to 3-4 sec (we have more than 40 000 signals).



Do you have any advice how to search for group names in hdf5 with
pytables in an efficient way?

ps: I would be most happy with a glob interface.



thanks for your advices in advance,

gergo

--
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


[Pytables-users] dates and space

2013-08-05 Thread Oleksandr Huziy
Hi Pytables users and developers:

I have a few questions to which I could not find the answer in the
documentation. Thank you in advance for any help.

1. If I store dates in Pytables, does it mean I could write queries like
table.where('date.month == 5')? Is there a common way to pass from python's
datetime to pytable's datetime and inversely?

2. I have several variables stored in the same file in a separate table for
each variable. And I use separate columns year, month, day, hour, minute,
second  - to mark the time for a record (the records are not necessarily
ordered in time) and this is for each variable. I was thinking to put all
the variables in the same table and put missing values for the variables
which do not have outputs for a given time step. Is it possible to put None
as a default value into a table (so I could easily filter dummy rows).
But then again the data comes in chunks, does this mean I would have to
check if a row with the same date already exist for a different variable?


I don't really like the ideas in 2, which are intended to save space, but
maybe all I need is a good compression level? Can somebody advise me on
this?



Cheers
--
Oleksandr (Sasha) Huziy
--
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] dates and space

2013-08-05 Thread Anthony Scopatz
On Mon, Aug 5, 2013 at 1:38 PM, Oleksandr Huziy guziy.sa...@gmail.comwrote:

 Hi Pytables users and developers:

 I have a few questions to which I could not find the answer in the
 documentation. Thank you in advance for any help.

 1. If I store dates in Pytables, does it mean I could write queries like
 table.where('date.month == 5')? Is there a common way to pass from python's
 datetime to pytable's datetime and inversely?


Hello Sasha,

Pytables times are the actual based off of C time, not Python's date times.
 This is because they use the HDF5 time types.  So unfortunately you can't
write queries like the one above.  (You'd need to talk to numexpr about
getting that kind of query implemented ~_~.)

Instead I would suggest that you store your times as Float64Atoms and
Float64Cols and then use arithmetic to figure out the query:

table.where((x / 3600 / 24)%12 == 5)

This is not perfect...


 2. I have several variables stored in the same file in a separate table
 for each variable. And I use separate columns year, month, day, hour,
 minute, second  - to mark the time for a record (the records are not
 necessarily ordered in time) and this is for each variable. I was thinking
 to put all the variables in the same table and put missing values for the
 variables which do not have outputs for a given time step. Is it possible
 to put None as a default value into a table (so I could easily filter dummy
 rows).


It is not possible to use None since that is a Python object of a
different type than the other integers you are trying to stick in the
column.  I would suggest that you use values with no actual meaning.  If
you are using normal ints you can use -1 to represent missing values.  If
you are using unsigned ints you have to pick other values, like 13 for
month on the Julian calendar.


 But then again the data comes in chunks, does this mean I would have to
 check if a row with the same date already exist for a different variable?


No you wouldn't you can store the same data multiple times in different
rows.


 I don't really like the ideas in 2, which are intended to save space, but
 maybe all I need is a good compression level? Can somebody advise me on
 this?


Compression would definitely help here since the date numbers are all
fairly similar.  Probably even a compression level of 1 would work.  Keep
in mind that sometime using compression actually speeds things up (see the
starving CPU problem).  You might just need to experiment with a few
different compression level to see how things go. 0, 1, 5, 9 gives you a
good spread.

Be Well
Anthony





 Cheers
 --
 Oleksandr (Sasha) Huziy


 --
 Get your SQL database under version control now!
 Version control is standard for application code, but databases havent
 caught up. So what steps can you take to put your SQL databases under
 version control? Why should you start doing it? Read more to find out.
 http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] dates and space

2013-08-05 Thread Jeff Reback
Here is a pandas solution for doing just this (which uses PyTables under the 
hood):

# create a frame
In [45]: df = DataFrame(randn(1000,2),index=date_range('2101',periods=1000))

In [53]: df
Out[53]: 
class 'pandas.core.frame.DataFrame'
DatetimeIndex: 1000 entries, 2000-01-01 00:00:00 to 2002-09-26 00:00:00
Freq: D
Data columns (total 2 columns):
0    1000  non-null values
1    1000  non-null values
dtypes: float64(2)

# store it as a table
In [46]: store = pd.HDFStore('test.h5',mode='w')

In [47]: store.append('df',df)

# select out the index (a datetimeindex in this case)
In [48]: c = store.select_column('df','index')

# get the coordinates of matching index
In [49]: coords = c[pd.DatetimeIndex(c).month==5]

# select those rows
In [51]: from pandas.io.pytables import Coordinates

In [50]: store.select('df',where=Coordinates(coords.index,None,None))
Out[50]: 
class 'pandas.core.frame.DataFrame'
DatetimeIndex: 93 entries, 2000-05-01 00:00:00 to 2002-05-31 00:00:00
Data columns (total 2 columns):
0    93  non-null values
1    93  non-null values
dtypes: float64(2)




 From: Anthony Scopatz scop...@gmail.com
To: Discussion list for PyTables pytables-users@lists.sourceforge.net 
Sent: Monday, August 5, 2013 2:54 PM
Subject: Re: [Pytables-users] dates and space
 


On Mon, Aug 5, 2013 at 1:38 PM, Oleksandr Huziy guziy.sa...@gmail.com wrote:

Hi Pytables users and developers:


I have a few questions to which I could not find the answer in the 
documentation. Thank you in advance for any help.


1. If I store dates in Pytables, does it mean I could write queries like 
table.where('date.month == 5')? Is there a common way to pass from python's 
datetime to pytable's datetime and inversely?

Hello Sasha, 

Pytables times are the actual based off of C time, not Python's date times.  
This is because they use the HDF5 time types.  So unfortunately you can't write 
queries like the one above.  (You'd need to talk to numexpr about getting that 
kind of query implemented ~_~.)

Instead I would suggest that you store your times as Float64Atoms and 
Float64Cols and then use arithmetic to figure out the query:

table.where((x / 3600 / 24)%12 == 5) 

This is not perfect...
 
2. I have several variables stored in the same file in a separate table for 
each variable. And I use separate columns year, month, day, hour, minute, 
second  - to mark the time for a record (the records are not necessarily 
ordered in time) and this is for each variable. I was thinking to put all the 
variables in the same table and put missing values for the variables which do 
not have outputs for a given time step. Is it possible to put None as a default 
value into a table (so I could easily filter dummy rows).


It is not possible to use None since that is a Python object of a different 
type than the other integers you are trying to stick in the column.  I would 
suggest that you use values with no actual meaning.  If you are using normal 
ints you can use -1 to represent missing values.  If you are using unsigned 
ints you have to pick other values, like 13 for month on the Julian calendar.
 
But then again the data comes in chunks, does this mean I would have to check 
if a row with the same date already exist for a different variable?

No you wouldn't you can store the same data multiple times in different rows.
 
I don't really like the ideas in 2, which are intended to save space, but maybe 
all I need is a good compression level? Can somebody advise me on this?


Compression would definitely help here since the date numbers are all fairly 
similar.  Probably even a compression level of 1 would work.  Keep in mind that 
sometime using compression actually speeds things up (see the starving CPU 
problem).  You might just need to experiment with a few different compression 
level to see how things go. 0, 1, 5, 9 gives you a good spread.

Be Well
Anthony
 






Cheers
--
Oleksandr (Sasha) Huziy   
--
Get your SQL database under version control now!
Version control is standard for application code, but databases havent
caught up. So what steps can you take to put your SQL databases under
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users



--
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk

Re: [Pytables-users] Clear chunks from CArray

2013-08-05 Thread Anthony Scopatz
Hello Giovanni,

I think you may need to del that slice and then possibly repack.  Hope this
helps.

Be Well
Anthony


On Mon, Aug 5, 2013 at 2:09 PM, Giovanni Luca Ciampaglia 
glciamp...@gmail.com wrote:

 Hello all,

 is there a way to clear out a chunk from a CArray? I noticed that setting
 the
 data to zero actually takes disk space, i.e.

 ***
 from tables import open_file, BoolAtom

 h5f = open_file('test.h5', 'w')
 ca = h5f.create_carray(h5f.root, 'carray', BoolAtom(), shape=(1000,1000),
 chunkshape=(1,1000))
 ca[:,:] = False
 h5f.close()
 ***

 The resulting file takes 249K ...

 Best,

 --
 Giovanni Luca Ciampaglia

 Postdoctoral fellow
 Center for Complex Networks and Systems Research
 Indiana University

 ✎ 910 E 10th St ∙ Bloomington ∙ IN 47408
 ☞ http://cnets.indiana.edu/
 ✉ gciam...@indiana.edu


 --
 Get your SQL database under version control now!
 Version control is standard for application code, but databases havent
 caught up. So what steps can you take to put your SQL databases under
 version control? Why should you start doing it? Read more to find out.
 http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Clear chunks from CArray

2013-08-05 Thread Giovanni Luca Ciampaglia
Hi Anthony,

what do you mean precisely? I tried

del ca[:,:]

but CArray does not support __delitem__. Looking in the documentation I could 
only find a method called remove_rows, but it's in Table, not CArray. Maybe I 
am 
missing something?

Thank,

Giovanni

On Mon 05 Aug 2013 03:43:42 PM EDT, 
pytables-users-requ...@lists.sourceforge.net 
wrote:

 Hello Giovanni, I think you may need to del that slice and then possibly 
 repack. Hope this helps. Be Well Anthony On Mon, Aug 5, 2013 at 2:09 PM, 
 Giovanni Luca Ciampaglia  glciamp...@gmail.com wrote:
 Hello all,

 is there a way to clear out a chunk from a CArray? I noticed that setting
 the
 data to zero actually takes disk space, i.e.

 ***
 from tables import open_file, BoolAtom

 h5f = open_file('test.h5', 'w')
 ca = h5f.create_carray(h5f.root, 'carray', BoolAtom(), shape=(1000,1000),
 chunkshape=(1,1000))
 ca[:,:] = False
 h5f.close()
 ***

 The resulting file takes 249K ...

 Best,

 --
 Giovanni Luca Ciampaglia

 Postdoctoral fellow
 Center for Complex Networks and Systems Research
 Indiana University

 ? 910 E 10th St ? Bloomington ? IN 47408
 ?http://cnets.indiana.edu/
 ?gciam...@indiana.edu





-- 
Giovanni Luca Ciampaglia

Postdoctoral fellow
Center for Complex Networks and Systems Research
Indiana University

✎ 910 E 10th St ∙ Bloomington ∙ IN 47408
☞ http://cnets.indiana.edu/
✉ gciam...@indiana.edu


--
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Clear chunks from CArray

2013-08-05 Thread Anthony Scopatz
On Mon, Aug 5, 2013 at 3:14 PM, Giovanni Luca Ciampaglia 
glciamp...@gmail.com wrote:

 Hi Anthony,

 what do you mean precisely? I tried

 del ca[:,:]

 but CArray does not support __delitem__. Looking in the documentation I
 could
 only find a method called remove_rows, but it's in Table, not CArray.
 Maybe I am
 missing something?


Huh, it should...  This is definitely an oversight on our part.  If you
could please open an issue for this -- or better yet -- write a pull
request that implements delitem, that'd be great!

So I think you are right that there is no current way to delete rows from a
CArray.  Oops!  (Of course, I may still be missing something as well).

It looks like EArray also has this problem too, otherwise I would just tell
you to use that.

Be Well
Anthony



 Thank,

 Giovanni

 On Mon 05 Aug 2013 03:43:42 PM EDT,
 pytables-users-requ...@lists.sourceforge.net
 wrote:
 
  Hello Giovanni, I think you may need to del that slice and then possibly
  repack. Hope this helps. Be Well Anthony On Mon, Aug 5, 2013 at 2:09 PM,
  Giovanni Luca Ciampaglia  glciamp...@gmail.com wrote:
  Hello all,
 
  is there a way to clear out a chunk from a CArray? I noticed that
 setting
  the
  data to zero actually takes disk space, i.e.
 
  ***
  from tables import open_file, BoolAtom
 
  h5f = open_file('test.h5', 'w')
  ca = h5f.create_carray(h5f.root, 'carray', BoolAtom(),
 shape=(1000,1000),
  chunkshape=(1,1000))
  ca[:,:] = False
  h5f.close()
  ***
 
  The resulting file takes 249K ...
 
  Best,
 
  --
  Giovanni Luca Ciampaglia
 
  Postdoctoral fellow
  Center for Complex Networks and Systems Research
  Indiana University
 
  ? 910 E 10th St ? Bloomington ? IN 47408
  ?http://cnets.indiana.edu/
  ?gciam...@indiana.edu
 
 



 --
 Giovanni Luca Ciampaglia

 Postdoctoral fellow
 Center for Complex Networks and Systems Research
 Indiana University

 ✎ 910 E 10th St ∙ Bloomington ∙ IN 47408
 ☞ http://cnets.indiana.edu/
 ✉ gciam...@indiana.edu



 --
 Get your SQL database under version control now!
 Version control is standard for application code, but databases havent
 caught up. So what steps can you take to put your SQL databases under
 version control? Why should you start doing it? Read more to find out.
 http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users