December 2012 12:47, Francesc Alted fal...@gmail.com wrote:
On 12/6/12 1:42 PM, Alvaro Tejero Cantero wrote:
Thank you for the comprehensive round-up. I have some ideas and
reports below.
What about ctables? The documentation says that it is specificly
column-access optimized, which
Thank you for the comprehensive round-up. I have some ideas and reports
below.
What about ctables? The documentation says that it is specificly
column-access optimized, which is what I need in this scenario (sometimes
sequential, sometimes random).
Unfortunately I could not get the rootdir
I'll answer myself on the size-checking: the right attributes are
Leaf.size_in_memory and Leaf.size_on_disk (per
http://pytables.github.com/usersguide/libref/hierarchy_classes.html)
-á.
On 6 December 2012 12:42, Alvaro Tejero Cantero alv...@minin.es wrote:
Thank you for the comprehensive
My system was benched for reads and writes with Blosc[1]:
with pt.openFile(paths.braw(block), 'r') as handle:
pt.setBloscMaxThreads(1)
%timeit a = handle.root.raw.c042[:]
pt.setBloscMaxThreads(6)
%timeit a = handle.root.raw.c042[:]
pt.setBloscMaxThreads(11)
%timeit a =
Hi!
You may want to have a look | reuse | combine your approach with that
implemented in pandas (pandas.io.pytables.HDFStore)
https://github.com/pydata/pandas/blob/master/pandas/io/pytables.py
(see _write_array method)
A certain liberality in Pandas with dtypes (partly induced by the
missing
Alvaro,
I think if you save the table as a record array, it should return you a
record array. Or does it return a structured array? Have you tried this?
Be Well
Anthony
On Thu, Jun 28, 2012 at 11:22 AM, Alvaro Tejero Cantero alv...@minin.es
wrote:
Hi,
I've noticed that tables are loaded
Thank you Josh, that is representative enough. In my system the
speedup of structured arrays is ~30x. A copy of the whole array is
still ~6x faster.
-á.
On Thu, Jun 28, 2012 at 10:13 PM, Josh Ayers josh.ay...@gmail.com wrote:
import time
import numpy as np
dtype = np.format_parser(['i4',
The graphical explanation of the different containers is masterly, and
I believe, supersedes the table that we had talked about for the
documentation.
I think it the schematics deserve a prominent place in the web page.
They are a very good symbolic explanation of the basics of PyTables.
As for
/145
In-memory assignments can shadow access to the object in the file.
IMHO this should not be allowed (in, fact, why not making the first
assignment behave like the second?).
-á.
On Mon, Apr 30, 2012 at 20:24, Alvaro Tejero Cantero alv...@minin.es wrote:
I am now on another computer (no access
wrote:
On 4/30/12 12:08 PM, Alvaro Tejero Cantero wrote:
Hi all,
I created a table:
joins.createTable('/','spikes',{'t20k':pt.Int32Col(),'tetrode':pt.UInt8Col(),
'unit':pt.UInt8Col()},'Spike times')
I populated it
joins.root.spikes.append(zip(np.arange(100),np.zeros(100),
3*np.ones(100
Hi,
There are two things about the design of the PyTables API that I don't
understand:
a) what is the reason to bind methods such as createTable so on to
the File object instead of putting the respective functions on the
tables module?
rationale: tables.createTable(where*, ...) could do the
On Thu, Apr 26, 2012 at 04:07, Francesc Alted fal...@pytables.org wrote:
On 4/25/12 7:05 AM, Alvaro Tejero Cantero wrote:
Hi, a minor update on this thread
* a bool array of 10**8 elements with True in two separate slices of
length 10**6 each compresses by ~350. Using .wheretrue to obtain
Alted fal...@pytables.org wrote:
On 4/25/12 6:13 AM, Alvaro Tejero Cantero wrote:
Hi,
Thanks for the clarification.
I retried today both with a normal and a completely sorted index on a
a blosc-compressed table (complevel 5) and could not reproduce the
putative bug either.
So
* play nicely together, but rather you have to
understand how they do. Thanks again.
Be Well
Anthony
On Wed, Apr 25, 2012 at 4:41 PM, Alvaro Tejero Cantero alv...@minin.eswrote:
* Hello list,
The relational model has a strong foundation and I have spent a few hours
thinking about what
Hi, a minor update on this thread
* a bool array of 10**8 elements with True in two separate slices of
length 10**6 each compresses by ~350. Using .wheretrue to obtain
indices is faster by a factor of 2 to 3 than np.nonzero(normal numpy
array). The resulting filesize is 248kb, still far from
* Hello list,
The relational model has a strong foundation and I have spent a few hours
thinking about what in PyTables is structurally different from it. Here are
my thoughts. I would be delighted if you could add/comment/correct on these
ideas. This could eventually help people with a
where will give me an iterator over the /values/; in this case I
wanted the indexes. Plus, it will give me an iterator, so it will be
trivially fast.
Are you interested in the timings of where + building a list? or where
+ building an array?
-á.
On Wed, Apr 18, 2012 at 19:02, Anthony Scopatz
)) 'test'
description := {
val: Int16Col(shape=(), dflt=0, pos=0)}
byteorder := 'little'
chunkshape := (32768,)
autoIndex := True
colindexes := {
val: Index(9, full, shuffle, zlib(1)).is_CSI=True}
On Thu, Apr 19, 2012 at 12:46, Alvaro Tejero Cantero alv...@minin.es wrote:
where
:= 'little'
chunkshape := None
On Thu, Apr 19, 2012 at 15:33, Anthony Scopatz scop...@gmail.com wrote:
I was interested in how long it takes to iterate, since this is arguably
where the
majority of the time is spent.
On Thu, Apr 19, 2012 at 8:43 AM, Alvaro Tejero Cantero alv...@minin.es
wrote
A single array with 312 000 000 int 16 values.
Two (uncompressed) ways to store it:
* Array
wa02[:10]
array([306, 345, 353, 335, 345, 345, 356, 341, 338, 357], dtype=int16
* Table wtab02 (single column, named 'val')
wtab02[:10]
array([(306,), (345,), (353,), (335,), (345,), (345,), (356,),
I'm continuing this thread on the dev list.
-á.
On Fri, Apr 13, 2012 at 21:17, Anthony Scopatz scop...@gmail.com wrote:
On Fri, Apr 13, 2012 at 12:30 PM, Alvaro Tejero Cantero alv...@minin.es
wrote:
Hi Anthony,
How does hierarchical help here? do you create a 'singer_name'/song
%20and%20Presentations/folk_HDF5_databases_pres.pdf
[5] https://github.com/numpy/numpy/blob/master/numpy/lib/recfunctions.py#L826
Be Well
Anthony
On Thu, Apr 12, 2012 at 11:03 AM, Alvaro Tejero Cantero alv...@minin.es
wrote:
Hi,
The topic of introducing some kind of relational management
%20Papers%20and%20Presentations/folk_HDF5_databases_pres.pdf
[5]
https://github.com/numpy/numpy/blob/master/numpy/lib/recfunctions.py#L826
Be Well
Anthony
On Thu, Apr 12, 2012 at 11:03 AM, Alvaro Tejero Cantero
alv...@minin.es
wrote:
Hi,
The topic of introducing some kind
Hi,
The topic of introducing some kind of relational management in
PyTables comes up with certain frequency.
Would it be possible to combine the virtues of RDBMS and hdf5's speed
via a mechanism such as SQLite Virtual Tables?
http://www.sqlite.org/vtab.html
I wonder if the required x*
Hi,
should PyTables flush on __exit__ ?
https://github.com/PyTables/PyTables/blob/master/tables/file.py#L2164
it is not clear to me if a File.close() call results in automatic
flushing all the nodes, since Node()._f_close() promises only On
nodes with data, it may be flushed to disk.
PM, Alvaro Tejero Cantero wrote:
Hi,
Trying to evaluate compression filters, I was looking for a call in
PyTables to get the size of a dataset (in bytes). As I didn't find it
I remembered the many benchmarks and found instead [1] that the way to
do it is to create single-dataset files
Francesc
On 3/26/12 12:43 PM, Alvaro Tejero Cantero wrote:
Would it be an option to have
* raw data on one table
* all imaginable columns used for query conditions in another table
(but how to grow it in columns without deleting recreating?)
and fetch indexes for the first based on .whereList
It seems that refs were proposed in the past, even with an implementation.
Maybe this could be a starting point:
http://www.mail-archive.com/pytables-users@lists.sourceforge.net/msg01374.html
-á.
On Thu, Mar 15, 2012 at 12:56, Alvaro Tejero Cantero alv...@minin.es wrote:
Does PyTables
Thanks Francesc, we're getting there :).
Some more precise questions below.
Here it is how you can do that in PyTables:
my_condition = '(col10.5) (col224) (col3 == novel)'
mycol4_values = [ r['col4'] for r in tbl.where(my_condtion) ]
ok, but having data upon which I want to operate also
Thank you for these e-mails with so many useful tips! This is
definitely a start. I will report what I find!
Cheers,
-á.
On Fri, Mar 16, 2012 at 15:00, Francesc Alted fal...@gmail.com wrote:
On Mar 16, 2012, at 1:55 AM, Alvaro Tejero Cantero wrote:
Thanks Francesc, we're getting
Hi everybody!
I plan to start using PyTables for an application at the University of
Oxford where data is collected in sessions of 2Gb Int16 data organized
as 64 parallel time series (64 detectors), each holding 15 million
points (15M).
I could handle this sessions separately, but ideally I
Does PyTables support object region references[1]?
When using soft links to other files, is a performance penalty
incurred? I like the idea of having the raw data, that never changes,
referenced from another file that is read-only. How do you guys
normally deal with this scenario?
Álvaro.
[1] I
Hi,
Here's my last question for today (I sent them separately because they
are quite unrelated).
I am thinking of writing a python decorator that for any processing
function (e.g. band-pass filter of median of data[0:3,:]) logs to the
attributes of the target HDF5 column
* the name of the
...@gmail.com wrote:
Hello Alvaro,
Thanks for your excitement!
On Thu, Mar 15, 2012 at 7:52 AM, Alvaro Tejero Cantero alv...@minin.es
wrote:
Hi everybody!
I plan to start using PyTables for an application at the University of
Oxford where data is collected in sessions of 2Gb Int16 data organized
AM, Alvaro Tejero Cantero alv...@minin.es
wrote:
Does PyTables support object region references[1]?
When using soft links to other files, is a performance penalty
incurred? I like the idea of having the raw data, that never changes,
referenced from another file that is read-only. How do you
:20 PM, Alvaro Tejero Cantero alv...@minin.es
wrote:
Hi!
Thanks for the prompt answer. Actually I am not clear about switching
from NxM array to N columns (64 in my case). How do I make a
rectangular selection with columns? With an NxM array I just have to
do arr[1:2,1:4] to select
36 matches
Mail list logo