[
https://issues.apache.org/jira/browse/SYSTEMML-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthias Boehm reassigned SYSTEMML-946:
---------------------------------------
Assignee: Matthias Boehm
> OOM on spark dataframe-matrix / csv-matrix conversion
> -----------------------------------------------------
>
> Key: SYSTEMML-946
> URL: https://issues.apache.org/jira/browse/SYSTEMML-946
> Project: SystemML
> Issue Type: Bug
> Components: Runtime
> Reporter: Matthias Boehm
> Assignee: Matthias Boehm
> Fix For: SystemML 0.11
>
> Attachments: mnist_lenet.dml
>
>
> The decision on dense/sparse block allocation in our dataframeToBinaryBlock
> and csvToBinaryBlock data converters is purely based on the sparsity. This
> works very well for the common case of tall & skinny matrices. However, for
> scenarios with dense data but huge number of columns a single partition will
> rarely have 1000 rows to fill an entire row of blocks. This leads to
> unnecessary allocation and dense-sparse conversion as well as potential
> out-of-memory errors because the temporary memory requirement can be up to
> 1000x larger than the input partition.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)