Dear Luke,

A Dissabte 04 Març 2006 08:46, vàreu escriure:
> I'm trying to adapt my research project on Biclustering to use PyTables
> as our data sets are quickly filling memory.

Good! I think PyTables is a good candidate for these kind of
applications.

> I have a few questions
> about how best to squeeze variable size data into the tables.  Our
> biclusters are currently represented by the condition and gene indices
> into the data array from the microarrays used to measure gene expression
> levels.   The conditions are best presented as an ordered set, while the
> genes can be just a set.  Currently, both are stored as NumPy arrays.
> Due to the way our algorithm creates larger biclusters, we need to have
> fast access to the ends of the condition sets.  The ideal layout in the
> tables would be the ends of the condtions set as separate columns and
> then the rest of the condtions in an array.  (So, conditions[0],
> conditions[1:-1], conditions[-1]) The genes only need an intersection
> operation performed on them, so they can be stored as a single
> columns.   My question is, from the HowToUse section I can't see how to
> store the variable length conditions and gene sets in the table without
> picking first to a string.  Pickling and then unpickling to do the
> operations seems like a speed killer.  Am I missing some Col class that
> can handle this use case?

A similar question has appeared recently in the pytables-users list
(which I warmly recommend you to subscribe in). The best approach to
solve this is to use a VLArray object to keep your variable length
records, and a Table for the fixed length ones. In your application
you will have to setup code to stablish the correspondence between
the row numbers in both datasets (but, if the correspondence is
one-to-one, then this is trivial).

Normally, you can find the archive of the list in:

http://sourceforge.net/mailarchive/forum.php?forum=pytables-users

although it seems that the SourceForge site is having some problems
keeping it up-to-date lately.

>
>     -Luke Imhoff
>     University of Minnesota
>
>
> P.S.   With the new NumPy support, will there be an option to not need
> numarray?  Currently, we've switched completely to NumPy, so there is no
> reason to have numarray installed except to compile.

For the time being (and for a rather long time more), PyTables *will
need* to have numarray installed in order to work. We plan to replace
numarray from the core of PyTables when NumPy stabilizes enough.
Meanwhile, you should know that PyTables (1.3-beta2) uses the numarray
protocol in order to convert numarray objects to/from NumPy, and, as
there is no data copy involved in the process, this additional
conversion step should not be a worry to the vast majority of PyTables
users (i.e. generally the bottlenecks are in other places).

Good Luck!

-- 
Francesc Altet



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to