Hi,

This sounds fine on the principle.  I'll let other comment on the details.

Regards

Antoine.


Le 19/08/2019 à 11:29, Kenta Murata a écrit :
> Hi,
> 
> I’d like to propose the following improvement of the sparse tensor
> format and implementation.
> 
> (1) To make variable bit-width indices available.
> 
> The main purpose of the first part of the proposal is making 32-bit
> indices available.  It allows us to serialize scipy.sparse.csr_matrix
> objects etc. with 32-bit indices without converting the index arrays
> to 64-bit values.  As Jed said in the previous discussion [1] in this
> ML, since 32-bit indices have advantages of the small memory
> footprints, I strongly consider this change is necessary for the
> sparse tensor support for Apache Arrow.  Adding both the type field in
> each sparse index format and the stride field in SparseCOOIndex format
> is necessary to do this.
> 
> (2) Adding the new COO format with separated row and column indices
> 
> Scipy.sparse.coo_matrix manages the indices of row and column in
> separated numpy arrays.  It is enough for representing a sparse
> matrix.  On the other hand, for supporting sparse tensors with
> arbitrary ranks, Arrow's SparseCOOIndex manages COO indices as one
> matrix. Hence we need to make a copy of indices to convert
> scipy.sparse.coo_matrix to Arrow’s SparseTensor.  Introducing the new
> COO format with separated row and column indices can resolve this
> issue.
> 
> (3) Adding SparseCSCIndex
> 
> The CSC format of sparse matrices has the advantage of faster scanning
> in columnar direction while the CSR format is faster in a row-wise
> scan. Because The aptitude of CSC is different from the one of CSR, I
> want to support CSC before releasing Arrow 1.0.
> 
> There are work-in-progress branch [2] of (1) above.  I’d appreciate
> any comments or suggestions.
> 
> [1] 
> http://mail-archives.apache.org/mod_mbox/arrow-dev/201903.mbox/%[email protected]%3e
> 
> [2] https://github.com/mrkn/arrow/tree/sparse_tensor_index_value_type
> 
> Regards,
> Kenta Murata
> 

Reply via email to