On Wed, Aug 7, 2013 at 4:39 AM, Gabriel J.L. Beckers <
pytables-u...@gbeckers.nl> wrote:

> Hi,
>
> I don't know if this is related in any way to Gergo's problem, but I
> have slow responses when querying which children a group contains, if
> that group contains big leafs. I am using pytables 2.5 and hdf5 1.8.9
> on linux 64 bit.
>
> Specifically, I found that using the _g_get_objinfo method (which is
> used by other methods that I use) is slow when used on a large leaf.
> The slowness is proportional to the size of the leaf. It is almost as
> if some process is actually reading the data instead of just info on
> the type of data. I am noticing this because my data is on an external
> usb3 disk. To give you an idea: that method takes almost 80 seconds to
> return the string 'Leaf' when used on a 5 Gb EArray. That should
> roughly correspond to reading the complete disk-based array. The info
> is cached somehow, because if I run the method a second time in the
> same python session it is very fast.
>
> If I copy my hdf5 file to my SSD disk, things are much faster, but
> running the method still takes 2 seconds or so on a 5 Gb leaf.
>
> Is this expected behavior and should I just avoid this method in my
> applications, or is something wrong?
>

Hi Gabriel,

Are you using compression on this EArray?  This method is basically a thin
wrapper over some HDF5 functions. I think that the data that you are asking
for (inadvertently, maybe) is just expensive to get.

Be Well
Anthony


>
> Best, Gabriel
>
> Anthony Scopatz <scop...@gmail.com> schreef:
>
> > On Mon, Aug 5, 2013 at 4:11 AM, Nyirő Gergő <gergo.ny...@gmail.com>
> wrote:
> >
> >> Hello,
> >>
> >>
> >> We develop a measurement evaluation tool, and we'd like to use
> >> pytables/hdf5 as a middle layer for signal accessing.
> >>
> >> We have to deal with the silly structure of the recorder device
> >> measurement format.
> >>
> >>
> >>
> >> The signals can be accessed via two identifiers:
> >>
> >> * device name: <source of the signal>-<channel of the
> >> message>-<another tag>-<yet another tag>
> >>
> >> * signal name
> >>
> >>
> >>
> >> The first identifier says the source information of the signal, which
> >> can be quite long.
> >>
> >> Therefore I grouped the device name into two layers:
> >>
> >> /<source of the signal>
> >>
> >>                 /<channel of the message>...
> >>
> >>                                 /<signal name>
> >>
> >>
> >>
> >> So if you have the same message from two channels, than you will get
> >> /foo-device-name
> >>
> >>                 /channel-1
> >>
> >>                                 /bar
> >>
> >>                                 /baz
> >>
> >>                 /channel-2
> >>
> >>                                 /bar
> >>
> >>                                 /baz
> >>
> >>
> >>
> >> Besides signal loading, we have to search for signal name as fast as
> >> possible, and return with the shortest unique device name part and the
> >> signal name.
> >>
> >> Using the structure above, iterating over the group names is quite
> >> slow. So I build up a table from device and signal name.
> >>
> >> As far as I know, the pytables query does not support string searching
> >> (e.g. startswidth, *foo[0-9]ch*, etc.), so fetching this table lead us
> >> to a pure python loop which is slow again.
> >>
> >> Therefore I build up a python dictionary from the table, which provide
> >> fast iteration against the table, but the init time increased from 100
> >> ms to 3-4 sec (we have more than 40 000 signals).
> >>
> >>
> >>
> >> Do you have any advice how to search for group names in hdf5 with
> >> pytables in an efficient way?
> >>
> >
> > Hi grego,
> >
> > Searching through group names, like accessing all HDF5 metadata, is slow.
> >  For group names this is because rather than searching through a list you
> > are traversing a B-tree, IIRC.  So you have to use the couple of tricks
> > that you used: 1) have another Table / Array of all table names, 2) read
> > this in once to a native Python data structure (dict here).
> >
> > However, 4 sec to read in this table seems excessive for data of this
> size.
> >  You are probably not reading this in properly.  You should be using:
> >
> > raw_grps = f.root.grp_names[:]
> >
> > or similar.
> >
> > Maybe other people have some other ideas.
> >
> > Be Well
> > Anthony
> >
> >
> >>
> >> ps: I would be most happy with a glob interface.
> >>
> >>
> >>
> >> thanks for your advices in advance,
> >>
> >> gergo
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> Get your SQL database under version control now!
> >> Version control is standard for application code, but databases havent
> >> caught up. So what steps can you take to put your SQL databases under
> >> version control? Why should you start doing it? Read more to find out.
> >>
> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
> >> _______________________________________________
> >> Pytables-users mailing list
> >> Pytables-users@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite!
> It's a free troubleshooting tool designed for production.
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to