Dear Luke, A Dissabte 04 Març 2006 08:46, vàreu escriure: > I'm trying to adapt my research project on Biclustering to use PyTables > as our data sets are quickly filling memory.
Good! I think PyTables is a good candidate for these kind of applications. > I have a few questions > about how best to squeeze variable size data into the tables. Our > biclusters are currently represented by the condition and gene indices > into the data array from the microarrays used to measure gene expression > levels. The conditions are best presented as an ordered set, while the > genes can be just a set. Currently, both are stored as NumPy arrays. > Due to the way our algorithm creates larger biclusters, we need to have > fast access to the ends of the condition sets. The ideal layout in the > tables would be the ends of the condtions set as separate columns and > then the rest of the condtions in an array. (So, conditions[0], > conditions[1:-1], conditions[-1]) The genes only need an intersection > operation performed on them, so they can be stored as a single > columns. My question is, from the HowToUse section I can't see how to > store the variable length conditions and gene sets in the table without > picking first to a string. Pickling and then unpickling to do the > operations seems like a speed killer. Am I missing some Col class that > can handle this use case? A similar question has appeared recently in the pytables-users list (which I warmly recommend you to subscribe in). The best approach to solve this is to use a VLArray object to keep your variable length records, and a Table for the fixed length ones. In your application you will have to setup code to stablish the correspondence between the row numbers in both datasets (but, if the correspondence is one-to-one, then this is trivial). Normally, you can find the archive of the list in: http://sourceforge.net/mailarchive/forum.php?forum=pytables-users although it seems that the SourceForge site is having some problems keeping it up-to-date lately. > > -Luke Imhoff > University of Minnesota > > > P.S. With the new NumPy support, will there be an option to not need > numarray? Currently, we've switched completely to NumPy, so there is no > reason to have numarray installed except to compile. For the time being (and for a rather long time more), PyTables *will need* to have numarray installed in order to work. We plan to replace numarray from the core of PyTables when NumPy stabilizes enough. Meanwhile, you should know that PyTables (1.3-beta2) uses the numarray protocol in order to convert numarray objects to/from NumPy, and, as there is no data copy involved in the process, this additional conversion step should not be a worry to the vast majority of PyTables users (i.e. generally the bottlenecks are in other places). Good Luck! -- Francesc Altet ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users