Matthias Boehm created SYSTEMML-1587:
----------------------------------------

             Summary: Performance ultra-sparse matrix reads
                 Key: SYSTEMML-1587
                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1587
             Project: SystemML
          Issue Type: Task
            Reporter: Matthias Boehm


We use the MCSR (modified compressed sparse row) format by default for sparse 
and ultra-sparse matrices because it allows for efficient incremental 
construction, including multi-threaded operations. However, even with 
SYSTEMML-1548, the MCSR is still too inefficient in its memory consumption 
leading to unnecessary garbage collection overhead. 

This task aims to read ultra-sparse matrices (e.g., permutation matrices) into 
CSR format. Since CSR does not allow for efficient incremental construction 
(with multiple unordered input streams), the approach is to use thread-local 
COO representations and finally merge them into a CSR representation. The 
temporary memory requirements are not problematic because size(CSR) + size(COO) 
< size(MCSR) for ultra sparse matrices and the COO representation can be 
partitioned across threads.

Note that this change should be done in a consistent manner for all matrix 
readers (single-threaded/multi-threaded, all formats).




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to