[ https://issues.apache.org/jira/browse/SYSTEMML-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias Boehm closed SYSTEMML-1587. ------------------------------------ Resolution: Fixed Assignee: Matthias Boehm Fix Version/s: SystemML 1.0 > Performance ultra-sparse matrix reads > ------------------------------------- > > Key: SYSTEMML-1587 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1587 > Project: SystemML > Issue Type: Task > Reporter: Matthias Boehm > Assignee: Matthias Boehm > Fix For: SystemML 1.0 > > > We use the MCSR (modified compressed sparse row) format by default for sparse > and ultra-sparse matrices because it allows for efficient incremental > construction, including multi-threaded operations. However, even with > SYSTEMML-1548, the MCSR is still too inefficient in its memory consumption > leading to unnecessary garbage collection overhead. > This task aims to read ultra-sparse matrices (e.g., permutation matrices) > into CSR format. Since CSR does not allow for efficient incremental > construction (with multiple unordered input streams), the approach is to use > thread-local COO representations and finally merge them into a CSR > representation. The temporary memory requirements are not problematic because > size(CSR) + size(COO) < size(MCSR) for ultra sparse matrices and the COO > representation can be partitioned across threads. > Note that this change should be done in a consistent manner for all matrix > readers (single-threaded/multi-threaded, all formats). -- This message was sent by Atlassian JIRA (v6.4.14#64029)