Re: [Pytables-users] Multithreaded decompress unexpectedly does not help

2012-12-07 Thread Alvaro Tejero Cantero
December 2012 12:47, Francesc Alted fal...@gmail.com wrote: On 12/6/12 1:42 PM, Alvaro Tejero Cantero wrote: Thank you for the comprehensive round-up. I have some ideas and reports below. What about ctables? The documentation says that it is specificly column-access optimized, which

Re: [Pytables-users] Multithreaded decompress unexpectedly does not help

2012-12-06 Thread Alvaro Tejero Cantero
Thank you for the comprehensive round-up. I have some ideas and reports below. What about ctables? The documentation says that it is specificly column-access optimized, which is what I need in this scenario (sometimes sequential, sometimes random). Unfortunately I could not get the rootdir

Re: [Pytables-users] Multithreaded decompress unexpectedly does not help

2012-12-06 Thread Alvaro Tejero Cantero
I'll answer myself on the size-checking: the right attributes are Leaf.size_in_memory and Leaf.size_on_disk (per http://pytables.github.com/usersguide/libref/hierarchy_classes.html) -á. On 6 December 2012 12:42, Alvaro Tejero Cantero alv...@minin.es wrote: Thank you for the comprehensive

[Pytables-users] Multithreaded decompress unexpectedly does not help

2012-12-05 Thread Alvaro Tejero Cantero
My system was benched for reads and writes with Blosc[1]: with pt.openFile(paths.braw(block), 'r') as handle: pt.setBloscMaxThreads(1) %timeit a = handle.root.raw.c042[:] pt.setBloscMaxThreads(6) %timeit a = handle.root.raw.c042[:] pt.setBloscMaxThreads(11) %timeit a =

Re: [Pytables-users] Optimizing pytables for reading entire columns at a time

2012-09-21 Thread Alvaro Tejero Cantero
Hi! You may want to have a look | reuse | combine your approach with that implemented in pandas (pandas.io.pytables.HDFStore) https://github.com/pydata/pandas/blob/master/pandas/io/pytables.py (see _write_array method) A certain liberality in Pandas with dtypes (partly induced by the missing

Re: [Pytables-users] Use of recarrays as representation for Tables in memory

2012-06-28 Thread Alvaro Tejero Cantero
Alvaro, I think if you save the table as a record array, it should return you a record array.  Or does it return a structured array?  Have you tried this? Be Well Anthony On Thu, Jun 28, 2012 at 11:22 AM, Alvaro Tejero Cantero alv...@minin.es wrote: Hi, I've noticed that tables are loaded

Re: [Pytables-users] Use of recarrays as representation for Tables in memory

2012-06-28 Thread Alvaro Tejero Cantero
Thank you Josh, that is representative enough. In my system the speedup of structured arrays is ~30x. A copy of the whole array is still ~6x faster. -á. On Thu, Jun 28, 2012 at 10:13 PM, Josh Ayers josh.ay...@gmail.com wrote: import time import numpy as np dtype = np.format_parser(['i4',

Re: [Pytables-users] New talk about PyTables

2012-05-10 Thread Alvaro Tejero Cantero
The graphical explanation of the different containers is masterly, and I believe, supersedes the table that we had talked about for the documentation. I think it the schematics deserve a prominent place in the web page. They are a very good symbolic explanation of the basics of PyTables. As for

Re: [Pytables-users] Column gets updated but table does not reflect

2012-05-01 Thread Alvaro Tejero Cantero
/145 In-memory assignments can shadow access to the object in the file. IMHO this should not be allowed (in, fact, why not making the first assignment behave like the second?). -á. On Mon, Apr 30, 2012 at 20:24, Alvaro Tejero Cantero alv...@minin.es wrote: I am now on another computer (no access

Re: [Pytables-users] Column gets updated but table does not reflect

2012-04-30 Thread Alvaro Tejero Cantero
wrote: On 4/30/12 12:08 PM, Alvaro Tejero Cantero wrote: Hi all, I created a table: joins.createTable('/','spikes',{'t20k':pt.Int32Col(),'tetrode':pt.UInt8Col(), 'unit':pt.UInt8Col()},'Spike times') I populated it joins.root.spikes.append(zip(np.arange(100),np.zeros(100), 3*np.ones(100

[Pytables-users] Design questions

2012-04-28 Thread Alvaro Tejero Cantero
Hi, There are two things about the design of the PyTables API that I don't understand: a) what is the reason to bind methods such as createTable so on to the File object instead of putting the respective functions on the tables module? rationale: tables.createTable(where*, ...) could do the

Re: [Pytables-users] Table.where and conditions across tables

2012-04-26 Thread Alvaro Tejero Cantero
On Thu, Apr 26, 2012 at 04:07, Francesc Alted fal...@pytables.org wrote: On 4/25/12 7:05 AM, Alvaro Tejero Cantero wrote: Hi, a minor update on this thread * a bool array of 10**8 elements with True in two separate slices of length 10**6 each compresses by ~350. Using .wheretrue to obtain

Re: [Pytables-users] Performance of tables vs. arrays (out vs in core?)

2012-04-26 Thread Alvaro Tejero Cantero
Alted fal...@pytables.org wrote: On 4/25/12 6:13 AM, Alvaro Tejero Cantero wrote: Hi, Thanks for the clarification. I retried today both with a normal and a completely sorted index on a a blosc-compressed table (complevel 5) and could not reproduce the putative bug either. So

Re: [Pytables-users] Main differences between PyTables and Relational

2012-04-26 Thread Alvaro Tejero Cantero
* play nicely together, but rather you have to understand how they do. Thanks again. Be Well Anthony On Wed, Apr 25, 2012 at 4:41 PM, Alvaro Tejero Cantero alv...@minin.eswrote: * Hello list, The relational model has a strong foundation and I have spent a few hours thinking about what

Re: [Pytables-users] Table.where and conditions across tables

2012-04-25 Thread Alvaro Tejero Cantero
Hi, a minor update on this thread * a bool array of 10**8 elements with True in two separate slices of length 10**6 each compresses by ~350. Using .wheretrue to obtain indices is faster by a factor of 2 to 3 than np.nonzero(normal numpy array). The resulting filesize is 248kb, still far from

[Pytables-users] Main differences between PyTables and Relational

2012-04-25 Thread Alvaro Tejero Cantero
* Hello list, The relational model has a strong foundation and I have spent a few hours thinking about what in PyTables is structurally different from it. Here are my thoughts. I would be delighted if you could add/comment/correct on these ideas. This could eventually help people with a

Re: [Pytables-users] Performance of tables vs. arrays (out vs in core?)

2012-04-19 Thread Alvaro Tejero Cantero
where will give me an iterator over the /values/; in this case I wanted the indexes. Plus, it will give me an iterator, so it will be trivially fast. Are you interested in the timings of where + building a list? or where + building an array? -á. On Wed, Apr 18, 2012 at 19:02, Anthony Scopatz

Re: [Pytables-users] Performance of tables vs. arrays (out vs in core?)

2012-04-19 Thread Alvaro Tejero Cantero
)) 'test' description := { val: Int16Col(shape=(), dflt=0, pos=0)} byteorder := 'little' chunkshape := (32768,) autoIndex := True colindexes := { val: Index(9, full, shuffle, zlib(1)).is_CSI=True} On Thu, Apr 19, 2012 at 12:46, Alvaro Tejero Cantero alv...@minin.es wrote: where

Re: [Pytables-users] Performance of tables vs. arrays (out vs in core?)

2012-04-19 Thread Alvaro Tejero Cantero
:= 'little' chunkshape := None On Thu, Apr 19, 2012 at 15:33, Anthony Scopatz scop...@gmail.com wrote: I was interested in how long it takes to iterate, since this is arguably where the majority of the time is spent. On Thu, Apr 19, 2012 at 8:43 AM, Alvaro Tejero Cantero alv...@minin.es wrote

[Pytables-users] Performance of tables vs. arrays (out vs in core?)

2012-04-18 Thread Alvaro Tejero Cantero
A single array with 312 000 000 int 16 values. Two (uncompressed) ways to store it: * Array wa02[:10] array([306, 345, 353, 335, 345, 345, 356, 341, 338, 357], dtype=int16 * Table wtab02 (single column, named 'val') wtab02[:10] array([(306,), (345,), (353,), (335,), (345,), (345,), (356,),

Re: [Pytables-users] SQLite Virtual Tables

2012-04-16 Thread Alvaro Tejero Cantero
I'm continuing this thread on the dev list. -á. On Fri, Apr 13, 2012 at 21:17, Anthony Scopatz scop...@gmail.com wrote: On Fri, Apr 13, 2012 at 12:30 PM, Alvaro Tejero Cantero alv...@minin.es wrote: Hi Anthony, How does hierarchical help here? do you create a 'singer_name'/song

Re: [Pytables-users] SQLite Virtual Tables

2012-04-13 Thread Alvaro Tejero Cantero
%20and%20Presentations/folk_HDF5_databases_pres.pdf [5] https://github.com/numpy/numpy/blob/master/numpy/lib/recfunctions.py#L826 Be Well Anthony On Thu, Apr 12, 2012 at 11:03 AM, Alvaro Tejero Cantero alv...@minin.es wrote: Hi, The topic of introducing some kind of relational management

Re: [Pytables-users] SQLite Virtual Tables

2012-04-13 Thread Alvaro Tejero Cantero
%20Papers%20and%20Presentations/folk_HDF5_databases_pres.pdf [5] https://github.com/numpy/numpy/blob/master/numpy/lib/recfunctions.py#L826 Be Well Anthony On Thu, Apr 12, 2012 at 11:03 AM, Alvaro Tejero Cantero alv...@minin.es wrote: Hi, The topic of introducing some kind

[Pytables-users] SQLite Virtual Tables

2012-04-12 Thread Alvaro Tejero Cantero
Hi, The topic of introducing some kind of relational management in PyTables comes up with certain frequency. Would it be possible to combine the virtues of RDBMS and hdf5's speed via a mechanism such as SQLite Virtual Tables? http://www.sqlite.org/vtab.html I wonder if the required x*

[Pytables-users] flush on __exit__

2012-04-02 Thread Alvaro Tejero Cantero
Hi, should PyTables flush on __exit__ ? https://github.com/PyTables/PyTables/blob/master/tables/file.py#L2164 it is not clear to me if a File.close() call results in automatic flushing all the nodes, since Node()._f_close() promises only On nodes with data, it may be flushed to disk.

Re: [Pytables-users] Determining effect of compression

2012-03-29 Thread Alvaro Tejero Cantero
PM, Alvaro Tejero Cantero wrote: Hi, Trying to evaluate compression filters, I was looking for a call in PyTables to get the size of a dataset (in bytes). As I didn't find it I remembered the many benchmarks and found instead [1] that the way to do it is to create single-dataset files

Re: [Pytables-users] Table.where and conditions across tables

2012-03-28 Thread Alvaro Tejero Cantero
Francesc On 3/26/12 12:43 PM, Alvaro Tejero Cantero wrote: Would it be an option to have * raw data on one table * all imaginable columns used for query conditions in another table (but how to grow it in columns without deleting  recreating?) and fetch indexes for the first based on .whereList

Re: [Pytables-users] Ref to region

2012-03-22 Thread Alvaro Tejero Cantero
It seems that refs were proposed in the past, even with an implementation. Maybe this could be a starting point: http://www.mail-archive.com/pytables-users@lists.sourceforge.net/msg01374.html -á. On Thu, Mar 15, 2012 at 12:56, Alvaro Tejero Cantero alv...@minin.es wrote: Does PyTables

Re: [Pytables-users] Advice for new user

2012-03-16 Thread Alvaro Tejero Cantero
Thanks Francesc, we're getting there :). Some more precise questions below. Here it is how you can do that in PyTables: my_condition = '(col10.5) (col224) (col3 == novel)' mycol4_values = [ r['col4'] for r in tbl.where(my_condtion) ] ok, but having data upon which I want to operate also

Re: [Pytables-users] Advice for new user

2012-03-16 Thread Alvaro Tejero Cantero
Thank you for these e-mails with so many useful tips! This is definitely a start. I will report what I find! Cheers, -á. On Fri, Mar 16, 2012 at 15:00, Francesc Alted fal...@gmail.com wrote: On Mar 16, 2012, at 1:55 AM, Alvaro Tejero Cantero wrote: Thanks Francesc, we're getting

[Pytables-users] Advice for new user

2012-03-15 Thread Alvaro Tejero Cantero
Hi everybody! I plan to start using PyTables for an application at the University of Oxford where data is collected in sessions of 2Gb Int16 data organized as 64 parallel time series (64 detectors), each holding 15 million points (15M). I could handle this sessions separately, but ideally I

[Pytables-users] Ref to region

2012-03-15 Thread Alvaro Tejero Cantero
Does PyTables support object region references[1]? When using soft links to other files, is a performance penalty incurred? I like the idea of having the raw data, that never changes, referenced from another file that is read-only. How do you guys normally deal with this scenario? Álvaro. [1] I

[Pytables-users] Decorators to track who wrote what

2012-03-15 Thread Alvaro Tejero Cantero
Hi, Here's my last question for today (I sent them separately because they are quite unrelated). I am thinking of writing a python decorator that for any processing function (e.g. band-pass filter of median of data[0:3,:]) logs to the attributes of the target HDF5 column * the name of the

Re: [Pytables-users] Advice for new user

2012-03-15 Thread Alvaro Tejero Cantero
...@gmail.com wrote: Hello Alvaro, Thanks for your excitement! On Thu, Mar 15, 2012 at 7:52 AM, Alvaro Tejero Cantero alv...@minin.es wrote: Hi everybody! I plan to start using PyTables for an application at the University of Oxford where data is collected in sessions of 2Gb Int16 data organized

Re: [Pytables-users] Ref to region

2012-03-15 Thread Alvaro Tejero Cantero
AM, Alvaro Tejero Cantero alv...@minin.es wrote: Does PyTables support object region references[1]? When using soft links to other files, is a performance penalty incurred? I like the idea of having the raw data, that never changes, referenced from another file that is read-only. How do you

Re: [Pytables-users] Advice for new user

2012-03-15 Thread Alvaro Tejero Cantero
:20 PM, Alvaro Tejero Cantero alv...@minin.es wrote: Hi! Thanks for the prompt answer. Actually I am not clear about switching from NxM array to N columns (64 in my case). How do I make a rectangular selection with columns? With an NxM array I just have to do arr[1:2,1:4] to select