[jira] [Commented] (SYSTEMML-946) OOM on spark dataframe-matrix / csv-matrix conversion

Matthias Boehm (JIRA) Wed, 21 Sep 2016 13:34:43 -0700

    [ 
https://issues.apache.org/jira/browse/SYSTEMML-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15511091#comment-15511091
 ]


Matthias Boehm commented on SYSTEMML-946:
-----------------------------------------

what is the script you're execution here?

> OOM on spark dataframe-matrix / csv-matrix conversion
> -----------------------------------------------------
>
>                 Key: SYSTEMML-946
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-946
>             Project: SystemML
>          Issue Type: Bug
>          Components: Runtime
>            Reporter: Matthias Boehm
>
> The decision on dense/sparse block allocation in our dataframeToBinaryBlock 
> and csvToBinaryBlock data converters is purely based on the sparsity. This 
> works very well for the common case of tall & skinny matrices. However, for 
> scenarios with dense data but huge number of columns a single partition will 
> rarely have 1000 rows to fill an entire row of blocks. This leads to 
> unnecessary allocation and dense-sparse conversion as well as potential 
> out-of-memory errors because the temporary memory requirement can be up to 
> 1000x larger than the input partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SYSTEMML-946) OOM on spark dataframe-matrix / csv-matrix conversion

Reply via email to