Re: [Numpy-discussion] sparse array data

Travis Oliphant Wed, 02 May 2012 18:25:13 -0700

On May 2, 2012, at 5:28 PM, Stéfan van der Walt wrote:

> On Wed, May 2, 2012 at 3:20 PM, Francesc Alted <franc...@continuum.io> wrote:
>> On 5/2/12 4:07 PM, Stéfan van der Walt wrote:
>> Well, as the OP said, coo_matrix does not support dimensions larger than
>> 2, right?
> 
> That's just an implementation detail, I would imagine--I'm trying to
> figure out if there is a new principle behind "synthetic dimensions"?
> By the way, David Cournapeau mentioned using b-trees for sparse ops a
> while ago; did you ever talk to him about those ideas?


The only new principle (which is not strictly new --- but new to NumPy's 
world-view) is using one (or more) fields of a structured array as "synthetic 
dimensions" which replace 1 or more of the raw table dimensions.     Thus, you 
could create a "view" of a NumPy array (or a group of NumPy arrays) where 1 or 
more dimensions is replaced with these "sparse dimensions".      This is a 
fully-general way to handle a mixture of sparse and dense structures in one 
general array interface.  

However, you lose the O(1) lookup as now you must search for the non-zero items 
in order to implement algorithms (indexes are critical and Francesc has some 
nice indexes in PyTables).  

A group-by operation can be replaced by an operation on "a sparse dimension" 
where you have mapped attributes to 1 or more dimensions in the underlying 
array. 

coo_matrix is just a special case of this more general idea.    If you add the 
ability to compress attributes, then you get csr, csc, and various other forms 
of matrices as well.  

More to come....  If you are interested in this sort of thing please let me 
know....    


-Travis


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] sparse array data

Reply via email to