[ 
https://issues.apache.org/jira/browse/SYSTEMML-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-1423:
-------------------------------------
    Description: 
In order to ensure consistency across backends, we first determine the number 
of non-zeros per block and subsequently generate random data accordingly. 
However, in case of ultra-sparse data sets, this temporary array can be almost 
as large as the dataset. Since this memory consumption is unaccounted and even 
required for distributed operations, there are various possible scenarios where 
this would cause OOMs. 

This task aims to solve this issue for all backends, by determining the nnz per 
block in a streaming manner without materialization.
        Summary: OOM on generating ultra-sparse rand data  (was: Unnecessary 
memory consumption on generating ultra-sparse rand data)

> OOM on generating ultra-sparse rand data
> ----------------------------------------
>
>                 Key: SYSTEMML-1423
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1423
>             Project: SystemML
>          Issue Type: Bug
>            Reporter: Matthias Boehm
>
> In order to ensure consistency across backends, we first determine the number 
> of non-zeros per block and subsequently generate random data accordingly. 
> However, in case of ultra-sparse data sets, this temporary array can be 
> almost as large as the dataset. Since this memory consumption is unaccounted 
> and even required for distributed operations, there are various possible 
> scenarios where this would cause OOMs. 
> This task aims to solve this issue for all backends, by determining the nnz 
> per block in a streaming manner without materialization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to