Matthias Boehm created SYSTEMML-1837:
----------------------------------------
Summary: Unary aggregate w/ corrections output to large physical
blocks
Key: SYSTEMML-1837
URL: https://issues.apache.org/jira/browse/SYSTEMML-1837
Project: SystemML
Issue Type: Bug
Reporter: Matthias Boehm
Many unary aggregate operations store corrections in additional columns or
rows. For example, {{rowSums(X)}} uses a two-column output to store sums and
corrections. In CP, we drop these corrections immediately after the operations,
while in MR and Spark these corrections are dropped after final aggregation.
The issue is that the {{MatrixBlock::dropLastRowsOrColums}} does not actually
drop the correction but simply shifts all values in the right starting
positions. Hence, the physical output is actually larger than what the memory
estimates represent. This leads to unnecessary large memory consumption during
subsequent operations and in the buffer pool, which can lead to OOMs. This task
aims to fix {{MatrixBlock::dropLastRowsOrColums}}.
In a subsequent task, we could also modify all unary aggregates to never
allocate the multi-column/row output when executed in CP. However, this
requires custom code paths for the different backends.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)