[ 
https://issues.apache.org/jira/browse/SYSTEMML-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-2170.
------------------------------------
       Resolution: Fixed
         Assignee: Matthias Boehm
    Fix Version/s: SystemML 1.1

> Remote parfor fails on reading ultra-sparse matrix with dims > 2G
> -----------------------------------------------------------------
>
>                 Key: SYSTEMML-2170
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2170
>             Project: SystemML
>          Issue Type: Bug
>            Reporter: Matthias Boehm
>            Assignee: Matthias Boehm
>            Priority: Major
>             Fix For: SystemML 1.1
>
>
> The parfor optimizer has a rewrite to select remote spark execution type even 
> if in the original program there are Spark operations if these fit into the 
> memory budget of the executors. However, this rewrite does not check for 
> valid integer dimensions and hence fails with 
> {code}
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: Matrix dimensions 
> too large for CP runtime: 3 x 5129281161
>         at 
> org.apache.sysml.runtime.io.MatrixReader.createOutputMatrixBlock(MatrixReader.java:80)
>         at 
> org.apache.sysml.runtime.io.ReaderBinaryBlockParallel.readMatrixFromHDFS(ReaderBinaryBlockParallel.java:59)
>         at 
> org.apache.sysml.runtime.util.DataConverter.readMatrixFromHDFS(DataConverter.java:207)
> {code}
> Here is the related optimizer output
> {code}
> ----------------------------
>  EXPLAIN OPT TREE (type=ABSTRACT_PLAN, size=22)
> ----------------------------
> --PARFOR, exec=CP, k=16, dp=NONE, tp=FIXED, rm=LOCAL_AUTOMATIC
> ----GENERIC (lines 122-126), exec=CP, k=1
> ------lix, exec=CP, k=1
> ------b(-), exec=CP, k=1
> ------b(*), exec=CP, k=1
> ------r(t), exec=CP, k=16
> ------ba(+*), exec=CP, k=16
> ------rix, exec=CP, k=1
> ------r(rshape), exec=CP, k=16
> ------ba(+*), exec=CP, k=16
> ------r(rshape), exec=CP, k=16
> ------rix, exec=CP, k=1
> ------r(rshape), exec=SPARK, k=1
> ------rix, exec=SPARK, k=1
> ------b(/), exec=CP, k=1
> ------u(exp), exec=CP, k=16
> ------b(-), exec=CP, k=1
> ------rix, exec=CP, k=1
> ------ua(maxRC), exec=CP, k=16
> ------ua(+RC), exec=CP, k=16
> ------b(*), exec=CP, k=1
> ------ua(+RC), exec=CP, k=16
> ----------------------------
> 18/03/06 23:17:33 DEBUG Optimizer: --- RULEBASED OPTIMIZER -------
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: Optimize w/ 
> max_mem=24271MB/4638MB/4638MB, max_k=16/144/144).
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: Optimize w/ 
> SparkClusterConfig:
> -- legacyVersion    = false (2.2.0)
> -- confOnly         = true
> -- numExecutors     = 6
> -- defaultPar       = 144
> -- memExecutor      = 69478645760
> -- memDataMinFrac   = 0.5
> -- memDataMaxFrac   = 0.6
> -- memBroadcastFrac = 0.21
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: estimated mem (serial exec) 
> M=109MB
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set data 
> partitioner' - result=NONE ()
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'remove unnecessary 
> compare matrix' - result=false ()
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set result 
> partitioning' - result=false
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: estimated new mem (serial 
> exec) M=109MB
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: estimated new mem (serial 
> exec, all CP) M=109MB
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: estimated new mem (cond 
> partitioning) M=109MB
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set execution 
> strategy' - result=REMOTE_SPARK (recompile=true)
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set operation exec 
> type CP' - result=2
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'enable data 
> colocation' - result=false
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set partition 
> replication factor' - result=false
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set export 
> replication factor' - result=true (3)
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set degree of 
> parallelism' - result=(see EXPLAIN)
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set task 
> partitioner' - result=STATIC
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set fused data 
> partitioning and execution' - result=false
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set transpose 
> sparse vector operations' - result=false
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set in-place 
> result indexing' - result=true ([delta_b_softmax], M=160MB)
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'disable CP 
> caching' - result=false (M=160MB)
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set result merge' 
> - result=LOCAL_MEM
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set recompile 
> memory budget' - result=24271MB
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'remove recursive 
> parfor' - result=0/0
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'remove unnecessary 
> parfor' - result=0
> 18/03/06 23:17:33 DEBUG OptimizationWrapper: ParFOR Opt: Optimized plan 
> (after optimization):
> ----------------------------
>  EXPLAIN OPT TREE (type=ABSTRACT_PLAN, size=22)
> ----------------------------
> --PARFOR, exec=SPARK, k=3, dp=NONE, tp=STATIC, rm=LOCAL_MEM
> ----GENERIC (lines 122-126), exec=CP, k=1
> ------lix, exec=CP, k=1
> ------b(-), exec=CP, k=1
> ------b(*), exec=CP, k=1
> ------r(t), exec=CP, k=1
> ------ba(+*), exec=CP, k=1
> ------rix, exec=CP, k=1
> ------r(rshape), exec=CP, k=1
> ------ba(+*), exec=CP, k=1
> ------r(rshape), exec=CP, k=1
> ------rix, exec=CP, k=1
> ------r(rshape), exec=CP, k=1
> ------rix, exec=CP, k=1
> ------b(/), exec=CP, k=1
> ------u(exp), exec=CP, k=1
> ------b(-), exec=CP, k=1
> ------rix, exec=CP, k=1
> ------ua(maxRC), exec=CP, k=1
> ------ua(+RC), exec=CP, k=1
> ------b(*), exec=CP, k=1
> ------ua(+RC), exec=CP, k=1
> ----------------------------
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to