Matthias Boehm created SYSTEMML-1587: ----------------------------------------
Summary: Performance ultra-sparse matrix reads Key: SYSTEMML-1587 URL: https://issues.apache.org/jira/browse/SYSTEMML-1587 Project: SystemML Issue Type: Task Reporter: Matthias Boehm We use the MCSR (modified compressed sparse row) format by default for sparse and ultra-sparse matrices because it allows for efficient incremental construction, including multi-threaded operations. However, even with SYSTEMML-1548, the MCSR is still too inefficient in its memory consumption leading to unnecessary garbage collection overhead. This task aims to read ultra-sparse matrices (e.g., permutation matrices) into CSR format. Since CSR does not allow for efficient incremental construction (with multiple unordered input streams), the approach is to use thread-local COO representations and finally merge them into a CSR representation. The temporary memory requirements are not problematic because size(CSR) + size(COO) < size(MCSR) for ultra sparse matrices and the COO representation can be partitioned across threads. Note that this change should be done in a consistent manner for all matrix readers (single-threaded/multi-threaded, all formats). -- This message was sent by Atlassian JIRA (v6.3.15#6346)