Matthias Boehm created SYSTEMML-946:

             Summary: OOM on spark dataframe-matrix / csv-matrix conversion
                 Key: SYSTEMML-946
             Project: SystemML
          Issue Type: Bug
          Components: Runtime
            Reporter: Matthias Boehm

The decision on dense/sparse block allocation in our dataframeToBinaryBlock and 
csvToBinaryBlock data converters is purely based on the sparsity. This works 
very well for the common case of tall & skinny matrices. However, for scenarios 
with dense data but huge number of columns a single partition will rarely have 
1000 rows to fill an entire row of blocks. This leads to unnecessary allocation 
and dense-sparse conversion as well as potential out-of-memory errors because 
the temporary memory requirement can be up to 1000x larger than the input 

This message was sent by Atlassian JIRA

Reply via email to