Matthias Boehm created SYSTEMML-1274: ----------------------------------------
Summary: Unnecessary rdd computation for nnz maintenance on write Key: SYSTEMML-1274 URL: https://issues.apache.org/jira/browse/SYSTEMML-1274 Project: SystemML Issue Type: Bug Components: Runtime Reporter: Matthias Boehm Our primitive for writing binary block RDDs to HDFS (as used in guarded collect), first computes the number of non-zeros (nnz) and subsequently writes out the data. This leads to redundant RDD computation, which can be expensive for large DAGs of RDD operations. Explicitly computing the nnz is unnecessary as we could simply piggyback this computation onto the write via an accumulator as done in multiple other places in SystemML. -- This message was sent by Atlassian JIRA (v6.3.15#6346)