from:"Alvaro Tejero Cantero"

Re: [Pytables-users] Multithreaded decompress unexpectedly does not help

2012-12-07 Thread Alvaro Tejero Cantero

December 2012 12:47, Francesc Alted fal...@gmail.com wrote: On 12/6/12 1:42 PM, Alvaro Tejero Cantero wrote: Thank you for the comprehensive round-up. I have some ideas and reports below. What about ctables? The documentation says that it is specificly column-access optimized, which

Re: [Pytables-users] Multithreaded decompress unexpectedly does not help

2012-12-06 Thread Alvaro Tejero Cantero

Thank you for the comprehensive round-up. I have some ideas and reports below. What about ctables? The documentation says that it is specificly column-access optimized, which is what I need in this scenario (sometimes sequential, sometimes random). Unfortunately I could not get the rootdir

Re: [Pytables-users] Multithreaded decompress unexpectedly does not help

2012-12-06 Thread Alvaro Tejero Cantero

I'll answer myself on the size-checking: the right attributes are Leaf.size_in_memory and Leaf.size_on_disk (per http://pytables.github.com/usersguide/libref/hierarchy_classes.html) -á. On 6 December 2012 12:42, Alvaro Tejero Cantero alv...@minin.es wrote: Thank you for the comprehensive

[Pytables-users] Multithreaded decompress unexpectedly does not help

2012-12-05 Thread Alvaro Tejero Cantero

My system was benched for reads and writes with Blosc[1]: with pt.openFile(paths.braw(block), 'r') as handle: pt.setBloscMaxThreads(1) %timeit a = handle.root.raw.c042[:] pt.setBloscMaxThreads(6) %timeit a = handle.root.raw.c042[:] pt.setBloscMaxThreads(11) %timeit a =

Re: [Pytables-users] Optimizing pytables for reading entire columns at a time

2012-09-21 Thread Alvaro Tejero Cantero

Hi! You may want to have a look | reuse | combine your approach with that implemented in pandas (pandas.io.pytables.HDFStore) https://github.com/pydata/pandas/blob/master/pandas/io/pytables.py (see _write_array method) A certain liberality in Pandas with dtypes (partly induced by the missing

Re: [Pytables-users] Use of recarrays as representation for Tables in memory

2012-06-28 Thread Alvaro Tejero Cantero

Alvaro, I think if you save the table as a record array, it should return you a record array. Or does it return a structured array? Have you tried this? Be Well Anthony On Thu, Jun 28, 2012 at 11:22 AM, Alvaro Tejero Cantero alv...@minin.es wrote: Hi, I've noticed that tables are loaded

Re: [Pytables-users] Use of recarrays as representation for Tables in memory

2012-06-28 Thread Alvaro Tejero Cantero

Thank you Josh, that is representative enough. In my system the speedup of structured arrays is ~30x. A copy of the whole array is still ~6x faster. -á. On Thu, Jun 28, 2012 at 10:13 PM, Josh Ayers josh.ay...@gmail.com wrote: import time import numpy as np dtype = np.format_parser(['i4',

Re: [Pytables-users] New talk about PyTables

2012-05-10 Thread Alvaro Tejero Cantero

The graphical explanation of the different containers is masterly, and I believe, supersedes the table that we had talked about for the documentation. I think it the schematics deserve a prominent place in the web page. They are a very good symbolic explanation of the basics of PyTables. As for

Re: [Pytables-users] Column gets updated but table does not reflect

2012-05-01 Thread Alvaro Tejero Cantero

/145 In-memory assignments can shadow access to the object in the file. IMHO this should not be allowed (in, fact, why not making the first assignment behave like the second?). -á. On Mon, Apr 30, 2012 at 20:24, Alvaro Tejero Cantero alv...@minin.es wrote: I am now on another computer (no access

Re: [Pytables-users] Column gets updated but table does not reflect

2012-04-30 Thread Alvaro Tejero Cantero

wrote: On 4/30/12 12:08 PM, Alvaro Tejero Cantero wrote: Hi all, I created a table: joins.createTable('/','spikes',{'t20k':pt.Int32Col(),'tetrode':pt.UInt8Col(), 'unit':pt.UInt8Col()},'Spike times') I populated it joins.root.spikes.append(zip(np.arange(100),np.zeros(100), 3*np.ones(100

[Pytables-users] Design questions

2012-04-28 Thread Alvaro Tejero Cantero

Hi, There are two things about the design of the PyTables API that I don't understand: a) what is the reason to bind methods such as createTable so on to the File object instead of putting the respective functions on the tables module? rationale: tables.createTable(where*, ...) could do the

Re: [Pytables-users] Table.where and conditions across tables

2012-04-26 Thread Alvaro Tejero Cantero

On Thu, Apr 26, 2012 at 04:07, Francesc Alted fal...@pytables.org wrote: On 4/25/12 7:05 AM, Alvaro Tejero Cantero wrote: Hi, a minor update on this thread * a bool array of 10**8 elements with True in two separate slices of length 10**6 each compresses by ~350. Using .wheretrue to obtain

Re: [Pytables-users] Performance of tables vs. arrays (out vs in core?)

2012-04-26 Thread Alvaro Tejero Cantero

Alted fal...@pytables.org wrote: On 4/25/12 6:13 AM, Alvaro Tejero Cantero wrote: Hi, Thanks for the clarification. I retried today both with a normal and a completely sorted index on a a blosc-compressed table (complevel 5) and could not reproduce the putative bug either. So

Re: [Pytables-users] Main differences between PyTables and Relational

2012-04-26 Thread Alvaro Tejero Cantero

* play nicely together, but rather you have to understand how they do. Thanks again. Be Well Anthony On Wed, Apr 25, 2012 at 4:41 PM, Alvaro Tejero Cantero alv...@minin.eswrote: * Hello list, The relational model has a strong foundation and I have spent a few hours thinking about what

Re: [Pytables-users] Table.where and conditions across tables

2012-04-25 Thread Alvaro Tejero Cantero

Hi, a minor update on this thread * a bool array of 10**8 elements with True in two separate slices of length 10**6 each compresses by ~350. Using .wheretrue to obtain indices is faster by a factor of 2 to 3 than np.nonzero(normal numpy array). The resulting filesize is 248kb, still far from

[Pytables-users] Main differences between PyTables and Relational

2012-04-25 Thread Alvaro Tejero Cantero

* Hello list, The relational model has a strong foundation and I have spent a few hours thinking about what in PyTables is structurally different from it. Here are my thoughts. I would be delighted if you could add/comment/correct on these ideas. This could eventually help people with a

Re: [Pytables-users] Performance of tables vs. arrays (out vs in core?)

2012-04-19 Thread Alvaro Tejero Cantero

where will give me an iterator over the /values/; in this case I wanted the indexes. Plus, it will give me an iterator, so it will be trivially fast. Are you interested in the timings of where + building a list? or where + building an array? -á. On Wed, Apr 18, 2012 at 19:02, Anthony Scopatz

Re: [Pytables-users] Performance of tables vs. arrays (out vs in core?)

2012-04-19 Thread Alvaro Tejero Cantero

)) 'test' description := { val: Int16Col(shape=(), dflt=0, pos=0)} byteorder := 'little' chunkshape := (32768,) autoIndex := True colindexes := { val: Index(9, full, shuffle, zlib(1)).is_CSI=True} On Thu, Apr 19, 2012 at 12:46, Alvaro Tejero Cantero alv...@minin.es wrote: where

Re: [Pytables-users] Performance of tables vs. arrays (out vs in core?)

2012-04-19 Thread Alvaro Tejero Cantero

:= 'little' chunkshape := None On Thu, Apr 19, 2012 at 15:33, Anthony Scopatz scop...@gmail.com wrote: I was interested in how long it takes to iterate, since this is arguably where the majority of the time is spent. On Thu, Apr 19, 2012 at 8:43 AM, Alvaro Tejero Cantero alv...@minin.es wrote

[Pytables-users] Performance of tables vs. arrays (out vs in core?)

2012-04-18 Thread Alvaro Tejero Cantero

A single array with 312 000 000 int 16 values. Two (uncompressed) ways to store it: * Array wa02[:10] array([306, 345, 353, 335, 345, 345, 356, 341, 338, 357], dtype=int16 * Table wtab02 (single column, named 'val') wtab02[:10] array([(306,), (345,), (353,), (335,), (345,), (345,), (356,),

Re: [Pytables-users] SQLite Virtual Tables

2012-04-16 Thread Alvaro Tejero Cantero

I'm continuing this thread on the dev list. -á. On Fri, Apr 13, 2012 at 21:17, Anthony Scopatz scop...@gmail.com wrote: On Fri, Apr 13, 2012 at 12:30 PM, Alvaro Tejero Cantero alv...@minin.es wrote: Hi Anthony, How does hierarchical help here? do you create a 'singer_name'/song

Re: [Pytables-users] SQLite Virtual Tables

2012-04-13 Thread Alvaro Tejero Cantero

%20and%20Presentations/folk_HDF5_databases_pres.pdf [5] https://github.com/numpy/numpy/blob/master/numpy/lib/recfunctions.py#L826 Be Well Anthony On Thu, Apr 12, 2012 at 11:03 AM, Alvaro Tejero Cantero alv...@minin.es wrote: Hi, The topic of introducing some kind of relational management

Re: [Pytables-users] SQLite Virtual Tables

2012-04-13 Thread Alvaro Tejero Cantero

%20Papers%20and%20Presentations/folk_HDF5_databases_pres.pdf [5] https://github.com/numpy/numpy/blob/master/numpy/lib/recfunctions.py#L826 Be Well Anthony On Thu, Apr 12, 2012 at 11:03 AM, Alvaro Tejero Cantero alv...@minin.es wrote: Hi, The topic of introducing some kind

[Pytables-users] SQLite Virtual Tables

2012-04-12 Thread Alvaro Tejero Cantero

Hi, The topic of introducing some kind of relational management in PyTables comes up with certain frequency. Would it be possible to combine the virtues of RDBMS and hdf5's speed via a mechanism such as SQLite Virtual Tables? http://www.sqlite.org/vtab.html I wonder if the required x*

[Pytables-users] flush on exit

2012-04-02 Thread Alvaro Tejero Cantero

Hi, should PyTables flush on __exit__ ? https://github.com/PyTables/PyTables/blob/master/tables/file.py#L2164 it is not clear to me if a File.close() call results in automatic flushing all the nodes, since Node()._f_close() promises only On nodes with data, it may be flushed to disk.

Re: [Pytables-users] Determining effect of compression

2012-03-29 Thread Alvaro Tejero Cantero

PM, Alvaro Tejero Cantero wrote: Hi, Trying to evaluate compression filters, I was looking for a call in PyTables to get the size of a dataset (in bytes). As I didn't find it I remembered the many benchmarks and found instead [1] that the way to do it is to create single-dataset files

Re: [Pytables-users] Table.where and conditions across tables

2012-03-28 Thread Alvaro Tejero Cantero

Francesc On 3/26/12 12:43 PM, Alvaro Tejero Cantero wrote: Would it be an option to have * raw data on one table * all imaginable columns used for query conditions in another table (but how to grow it in columns without deleting recreating?) and fetch indexes for the first based on .whereList

Re: [Pytables-users] Ref to region

2012-03-22 Thread Alvaro Tejero Cantero

It seems that refs were proposed in the past, even with an implementation. Maybe this could be a starting point: http://www.mail-archive.com/pytables-users@lists.sourceforge.net/msg01374.html -á. On Thu, Mar 15, 2012 at 12:56, Alvaro Tejero Cantero alv...@minin.es wrote: Does PyTables

Re: [Pytables-users] Advice for new user

2012-03-16 Thread Alvaro Tejero Cantero

Thanks Francesc, we're getting there :). Some more precise questions below. Here it is how you can do that in PyTables: my_condition = '(col10.5) (col224) (col3 == novel)' mycol4_values = [ r['col4'] for r in tbl.where(my_condtion) ] ok, but having data upon which I want to operate also

Re: [Pytables-users] Advice for new user

2012-03-16 Thread Alvaro Tejero Cantero

Thank you for these e-mails with so many useful tips! This is definitely a start. I will report what I find! Cheers, -á. On Fri, Mar 16, 2012 at 15:00, Francesc Alted fal...@gmail.com wrote: On Mar 16, 2012, at 1:55 AM, Alvaro Tejero Cantero wrote: Thanks Francesc, we're getting

[Pytables-users] Advice for new user

2012-03-15 Thread Alvaro Tejero Cantero

Hi everybody! I plan to start using PyTables for an application at the University of Oxford where data is collected in sessions of 2Gb Int16 data organized as 64 parallel time series (64 detectors), each holding 15 million points (15M). I could handle this sessions separately, but ideally I

[Pytables-users] Ref to region

2012-03-15 Thread Alvaro Tejero Cantero

Does PyTables support object region references[1]? When using soft links to other files, is a performance penalty incurred? I like the idea of having the raw data, that never changes, referenced from another file that is read-only. How do you guys normally deal with this scenario? Álvaro. [1] I

[Pytables-users] Decorators to track who wrote what

2012-03-15 Thread Alvaro Tejero Cantero

Hi, Here's my last question for today (I sent them separately because they are quite unrelated). I am thinking of writing a python decorator that for any processing function (e.g. band-pass filter of median of data[0:3,:]) logs to the attributes of the target HDF5 column * the name of the

Re: [Pytables-users] Advice for new user

2012-03-15 Thread Alvaro Tejero Cantero

...@gmail.com wrote: Hello Alvaro, Thanks for your excitement! On Thu, Mar 15, 2012 at 7:52 AM, Alvaro Tejero Cantero alv...@minin.es wrote: Hi everybody! I plan to start using PyTables for an application at the University of Oxford where data is collected in sessions of 2Gb Int16 data organized

Re: [Pytables-users] Ref to region

2012-03-15 Thread Alvaro Tejero Cantero

AM, Alvaro Tejero Cantero alv...@minin.es wrote: Does PyTables support object region references[1]? When using soft links to other files, is a performance penalty incurred? I like the idea of having the raw data, that never changes, referenced from another file that is read-only. How do you

Re: [Pytables-users] Advice for new user

2012-03-15 Thread Alvaro Tejero Cantero

:20 PM, Alvaro Tejero Cantero alv...@minin.es wrote: Hi! Thanks for the prompt answer. Actually I am not clear about switching from NxM array to N columns (64 in my case). How do I make a rectangular selection with columns? With an NxM array I just have to do arr[1:2,1:4] to select

Re: [Pytables-users] Multithreaded decompress unexpectedly does not help

Re: [Pytables-users] Multithreaded decompress unexpectedly does not help

Re: [Pytables-users] Multithreaded decompress unexpectedly does not help

[Pytables-users] Multithreaded decompress unexpectedly does not help

Re: [Pytables-users] Optimizing pytables for reading entire columns at a time

Re: [Pytables-users] Use of recarrays as representation for Tables in memory

Re: [Pytables-users] Use of recarrays as representation for Tables in memory

Re: [Pytables-users] New talk about PyTables

Re: [Pytables-users] Column gets updated but table does not reflect

Re: [Pytables-users] Column gets updated but table does not reflect

[Pytables-users] Design questions

Re: [Pytables-users] Table.where and conditions across tables

Re: [Pytables-users] Performance of tables vs. arrays (out vs in core?)

Re: [Pytables-users] Main differences between PyTables and Relational

Re: [Pytables-users] Table.where and conditions across tables

[Pytables-users] Main differences between PyTables and Relational

Re: [Pytables-users] Performance of tables vs. arrays (out vs in core?)

Re: [Pytables-users] Performance of tables vs. arrays (out vs in core?)

Re: [Pytables-users] Performance of tables vs. arrays (out vs in core?)

[Pytables-users] Performance of tables vs. arrays (out vs in core?)

Re: [Pytables-users] SQLite Virtual Tables

Re: [Pytables-users] SQLite Virtual Tables

Re: [Pytables-users] SQLite Virtual Tables

[Pytables-users] SQLite Virtual Tables

[Pytables-users] flush on exit

Re: [Pytables-users] Determining effect of compression

Re: [Pytables-users] Table.where and conditions across tables

Re: [Pytables-users] Ref to region

Re: [Pytables-users] Advice for new user

Re: [Pytables-users] Advice for new user

[Pytables-users] Advice for new user

[Pytables-users] Ref to region

[Pytables-users] Decorators to track who wrote what

Re: [Pytables-users] Advice for new user

Re: [Pytables-users] Ref to region

Re: [Pytables-users] Advice for new user

36 matches

Site Navigation

Mail list logo

Footer information