[
https://issues.apache.org/jira/browse/SYSTEMML-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthias Boehm updated SYSTEMML-1623:
-------------------------------------
Description:
The current JMLC conversion functions cause a very inefficient and memory
intensive code path with leads to unnecessary OOMs that can be easily avoided.
This task aims to add and improve these primitives to allow convenient data
conversions with much better memory efficiency.
For example consider a scenario of a 500k x 90 input model available as csv
file in the classpath, which string representation requires 1GB. The typical
codepath currently use looks as follows:
{code}
ResourceStream(model_file)
-> prep
---> StringBuilder -> String [3GB tmp, 1GB]
-> convertToDoubleMatrix
---> byte[] -> ByteInputStream [2GB]
---> MatrixBlock [360MB]
---> double[][] [400MB]
-> setMatrix
---> MatrixBlock [360MB]
{code}
which requires at least 4GB of memory due to strong references to all
intermediates. The goal of this task is to reduce this to the following, which
only requires 360MB of memory:
{code}
ResourceStream(model_file)
-> convertToMatrix
---> MatrixBlock [360MB]
-> setMatrix
---> by references
{code}
was:
The current JMLC conversion functions cause a very inefficient and memory
intensive code path with leads to unnecessary OOMs that can be easily avoided.
This task aims to add and improve these primitives to allow convenient data
conversions with much better memory efficiency.
For example consider a scenario of a 500k x 90 input model available as csv
file in the classpath. The typical codepath currently use looks as follows:
{code}
ResourceStream(model_file)
-> prep
---> StringBuilder -> String [3GB tmp, 1GB]
-> convertToDoubleMatrix
---> byte[] -> ByteInputStream [2GB]
---> MatrixBlock [360MB]
---> double[][] [400MB]
-> setMatrix
---> MatrixBlock [360MB]
{code}
which requires at least 4GB of memory due to strong references to all
intermediates. The goal of this task is to reduce this to the following:
{code}
ResourceStream(model_file)
-> convertToMatrix
---> MatrixBlock [360MB]
-> setMatrix
---> by references
{code}
> Memory efficiency JMLC matrix and frame conversions
> ---------------------------------------------------
>
> Key: SYSTEMML-1623
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1623
> Project: SystemML
> Issue Type: Bug
> Reporter: Matthias Boehm
>
> The current JMLC conversion functions cause a very inefficient and memory
> intensive code path with leads to unnecessary OOMs that can be easily
> avoided. This task aims to add and improve these primitives to allow
> convenient data conversions with much better memory efficiency.
> For example consider a scenario of a 500k x 90 input model available as csv
> file in the classpath, which string representation requires 1GB. The typical
> codepath currently use looks as follows:
> {code}
> ResourceStream(model_file)
> -> prep
> ---> StringBuilder -> String [3GB tmp, 1GB]
> -> convertToDoubleMatrix
> ---> byte[] -> ByteInputStream [2GB]
> ---> MatrixBlock [360MB]
> ---> double[][] [400MB]
> -> setMatrix
> ---> MatrixBlock [360MB]
> {code}
> which requires at least 4GB of memory due to strong references to all
> intermediates. The goal of this task is to reduce this to the following,
> which only requires 360MB of memory:
> {code}
> ResourceStream(model_file)
> -> convertToMatrix
> ---> MatrixBlock [360MB]
> -> setMatrix
> ---> by references
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)