[
https://issues.apache.org/jira/browse/SYSTEMML-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Glenn Weidner updated SYSTEMML-1837:
------------------------------------
Fix Version/s: (was: SystemML 1.0)
SystemML 0.15
> Unary aggregate w/ corrections output to large physical blocks
> --------------------------------------------------------------
>
> Key: SYSTEMML-1837
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1837
> Project: SystemML
> Issue Type: Bug
> Reporter: Matthias Boehm
> Assignee: Matthias Boehm
> Fix For: SystemML 0.15
>
>
> Many unary aggregate operations store corrections in additional columns or
> rows. For example, {{rowSums(X)}} uses a two-column output to store sums and
> corrections. In CP, we drop these corrections immediately after the
> operations, while in MR and Spark these corrections are dropped after final
> aggregation. The issue is that the {{MatrixBlock::dropLastRowsOrColums}} does
> not actually drop the correction but simply shifts all values in the right
> starting positions. Hence, the physical output is actually larger than what
> the memory estimates represent. This leads to unnecessary large memory
> consumption during subsequent operations and in the buffer pool, which can
> lead to OOMs. This task aims to fix {{MatrixBlock::dropLastRowsOrColums}}.
> In a subsequent task, we could also modify all unary aggregates to never
> allocate the multi-column/row output when executed in CP. However, this
> requires custom code paths for the different backends.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)