[ 
https://issues.apache.org/jira/browse/SYSTEMML-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glenn Weidner updated SYSTEMML-1727:
------------------------------------
    Fix Version/s:     (was: SystemML 1.0)
                   SystemML 0.15

> Wrong mvvar instruction compilation for persistent writes
> ---------------------------------------------------------
>
>                 Key: SYSTEMML-1727
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1727
>             Project: SystemML
>          Issue Type: Bug
>            Reporter: Matthias Boehm
>            Assignee: Matthias Boehm
>             Fix For: SystemML 0.15
>
>
> Currently, we compile persistent writes in binary format that read from 
> transient reads to mvvar instructions, which are supposed to be meta data 
> operations on HDFS. However, this comes with two fundamental problems:
> * In case of different file URI schemes between scratch space and persistent 
> write location, we cannot use a rename at all, requiring us to read and write 
> the matrix explicitly. For large data this ultimately leads to OOMs.
> * For scripts where intermediates are fed into such persistent writes but 
> subsequently used by other operations, this can lead to problem of missing 
> inputs because the intermediate does no longer exist under the given 
> temporary filename.
> An example where scripts fail for the second reason is given below:
> {code}
> PROGRAM
> --MAIN PROGRAM
> ----GENERIC (lines 1-1) [recompile=false]
> ------(8) dg(rand) [1000000,1000,1000,1000,1000000000] [0,0,7629 -> 7629MB], 
> CP
> ------(9) TWrite X (8) [1000000,1000,1000,1000,1000000000] [7629,0,0 -> 
> 7629MB], CP
> ----GENERIC (lines 5-5) [recompile=false]
> ----GENERIC (lines 9-9) [recompile=false]
> ------(17) TRead X [1000000,1000,1000,1000,1000000000] [0,0,7629 -> 7629MB], 
> CP
> ------(20) PWrite X (17) [1000000,1000,1000,1000,1000000000] [7629,0,0 -> 
> 7629MB], CP
> ----GENERIC (lines 13-13) [recompile=false]
> ------(24) TRead X [1000000,1000,1000,1000,1000000000] [0,0,7629 -> 7629MB], 
> CP
> ------(26) b(+) (24) [1000000,1000,1000,1000,-1] [7629,0,7629 -> 15259MB], CP
> ------(27) ua(+RC) (26) [0,0,-1,-1,-1] [7629,0,0 -> 7629MB], CP
> ------(28) u(print) (27) [-1,-1,-1,-1,-1] [0,0,0 -> 0MB]
> {code}
> This task aims to fix both related issues by reworking the generation of 
> rmvar instructions in favor of explicit write instructions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to