[
https://issues.apache.org/jira/browse/SYSTEMML-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Glenn Weidner updated SYSTEMML-1727:
------------------------------------
Fix Version/s: (was: SystemML 1.0)
SystemML 0.15
> Wrong mvvar instruction compilation for persistent writes
> ---------------------------------------------------------
>
> Key: SYSTEMML-1727
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1727
> Project: SystemML
> Issue Type: Bug
> Reporter: Matthias Boehm
> Assignee: Matthias Boehm
> Fix For: SystemML 0.15
>
>
> Currently, we compile persistent writes in binary format that read from
> transient reads to mvvar instructions, which are supposed to be meta data
> operations on HDFS. However, this comes with two fundamental problems:
> * In case of different file URI schemes between scratch space and persistent
> write location, we cannot use a rename at all, requiring us to read and write
> the matrix explicitly. For large data this ultimately leads to OOMs.
> * For scripts where intermediates are fed into such persistent writes but
> subsequently used by other operations, this can lead to problem of missing
> inputs because the intermediate does no longer exist under the given
> temporary filename.
> An example where scripts fail for the second reason is given below:
> {code}
> PROGRAM
> --MAIN PROGRAM
> ----GENERIC (lines 1-1) [recompile=false]
> ------(8) dg(rand) [1000000,1000,1000,1000,1000000000] [0,0,7629 -> 7629MB],
> CP
> ------(9) TWrite X (8) [1000000,1000,1000,1000,1000000000] [7629,0,0 ->
> 7629MB], CP
> ----GENERIC (lines 5-5) [recompile=false]
> ----GENERIC (lines 9-9) [recompile=false]
> ------(17) TRead X [1000000,1000,1000,1000,1000000000] [0,0,7629 -> 7629MB],
> CP
> ------(20) PWrite X (17) [1000000,1000,1000,1000,1000000000] [7629,0,0 ->
> 7629MB], CP
> ----GENERIC (lines 13-13) [recompile=false]
> ------(24) TRead X [1000000,1000,1000,1000,1000000000] [0,0,7629 -> 7629MB],
> CP
> ------(26) b(+) (24) [1000000,1000,1000,1000,-1] [7629,0,7629 -> 15259MB], CP
> ------(27) ua(+RC) (26) [0,0,-1,-1,-1] [7629,0,0 -> 7629MB], CP
> ------(28) u(print) (27) [-1,-1,-1,-1,-1] [0,0,0 -> 0MB]
> {code}
> This task aims to fix both related issues by reworking the generation of
> rmvar instructions in favor of explicit write instructions.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)