org.apache.sysds.hops.rewrite.RewriteInjectSparkPReadCheckpointing (which has <100 lines of code)
Modifying it to no checkpoint after pread if the operation is an action and no other consumer (such as twrite - pass to other statement blocks) would be safe to do.
Regards, Matthias On 3/15/2023 6:04 PM, s...@bgaard.dk wrote:
If i execute a simple script of ``` X = read($1) print(sum(X)) ``` on a 80GB file. Then we inject a spark chkpoint, when there are no other spark instructions. Ideally this checkpoint should not happen, but where do i go to fix this? explain: # EXPLAIN (RUNTIME): # Memory Budget local/remote = 143360MB/?MB/?MB/?MB # Degree of Parallelism (vcores) local/remote = 48/? PROGRAM ( size CP/SP = 6/1 ) --MAIN PROGRAM ----GENERIC (lines 1-2) [recompile=false] ------CP createvar pREADX testFile.dat false MATRIX binary 1000000 10000 1000 10000000000 copy ------CP createvar _mVar0 scratch_space//_p70322_192.168.0.11//_t0/temp0 true MATRIX binary 1000000 10000 1000 10000000000 copy ------SPARK chkpoint pREADX.MATRIX.FP64 _mVar0.MATRIX.FP64 MEMORY_AND_DISK ------CP uak+ _mVar0.MATRIX.FP64 _Var1.SCALAR.STRING 48 ------CP rmvar _mVar0 ------CP print _Var1.SCALAR.STRING.false _Var2.SCALAR.STRING 48 ------CP rmvar _Var1 _Var2 4.966327461583818E9 SystemDS Statistics: Total elapsed time: 46.230 sec. Total compilation time: 0.801 sec. Total execution time: 45.429 sec. Number of compiled Spark inst: 1. Number of executed Spark inst: 1. Cache hits (Mem/Li/WB/FS/HDFS): 0/0/0/0/1. Cache writes (Li/WB/FS/HDFS): 0/0/0/0. Cache times (ACQr/m, RLS, EXP): 24.751/0.000/0.000/0.000 sec. HOP DAGs recompiled (PRED, SB): 0/0. HOP DAGs recompile time: 0.000 sec. Spark ctx create time (lazy): 18.464 sec. Spark trans counts (par,bc,col):0/0/0. Spark trans times (par,bc,col): 0.000/0.000/0.000 secs. Spark async. count (pf,bc,op): 0/0/0. Total JIT compile time: 14.217 sec. Total JVM GC count: 4. Total JVM GC time: 0.1 sec. Heavy hitter instructions: # Instruction Time(s) Count 1 uak+ 26.248 1 2 sp_chkpoint 19.165 1 3 createvar 0.014 2 4 print 0.000 1 5 rmvar 0.000 2