Matthias Boehm created SYSTEMML-1274:
----------------------------------------

             Summary: Unnecessary rdd computation for nnz maintenance on write
                 Key: SYSTEMML-1274
                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1274
             Project: SystemML
          Issue Type: Bug
          Components: Runtime
            Reporter: Matthias Boehm


Our primitive for writing binary block RDDs to HDFS (as used in guarded 
collect), first computes the number of non-zeros (nnz) and subsequently writes 
out the data. This leads to redundant RDD computation, which can be expensive 
for large DAGs of RDD operations. Explicitly computing the nnz is unnecessary 
as we could simply piggyback this computation onto the write via an accumulator 
as done in multiple other places in SystemML. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to