Baunsgaard opened a new pull request, #1760:
URL: https://github.com/apache/systemds/pull/1760

   This PR optimizes the pattern matching for type detection via precompiling 
patterns, and other minor optimizations.
   
   Before:
   
   ```txt
   SystemDS Statistics:
   Total elapsed time:          12.991 sec.
   Total compilation time:              0.233 sec.
   Total execution time:                12.758 sec.
   Cache hits (Mem/Li/WB/FS/HDFS):      4/0/0/0/1.
   Cache writes (Li/WB/FS/HDFS):        0/2/0/0.
   Cache times (ACQr/m, RLS, EXP):      1.885/0.189/1.075/0.889 sec.
   HOP DAGs recompiled (PRED, SB):      0/1.
   HOP DAGs recompile time:     0.003 sec.
   Total JIT compile time:              5.697 sec.
   Total JVM GC count:          49.
   Total JVM GC time:           0.884 sec.
   Heavy hitter instructions:
    #  Instruction   Time(s)  Count
    1  detectSchema   10.570      1
    2  applySchema     1.281      1
    3  write           0.889      1
    4  createvar       0.008      4
    5  toString        0.003      1
   ```
   
   After:
   
   ```txt
   SystemDS Statistics:
   Total elapsed time:          6.387 sec.
   Total compilation time:              0.233 sec.
   Total execution time:                6.154 sec.
   Cache hits (Mem/Li/WB/FS/HDFS):      4/0/0/0/1.
   Cache writes (Li/WB/FS/HDFS):        0/2/0/0.
   Cache times (ACQr/m, RLS, EXP):      1.897/0.179/1.075/0.901 sec.
   HOP DAGs recompiled (PRED, SB):      0/1.
   HOP DAGs recompile time:     0.004 sec.
   Total JIT compile time:              4.692 sec.
   Total JVM GC count:          6.
   Total JVM GC time:           1.076 sec.
   Heavy hitter instructions:
    #  Instruction   Time(s)  Count
    1  detectSchema    3.971      1
    2  applySchema     1.260      1
    3  write           0.901      1
    4  createvar       0.011      4
   ```
   
   
   Currently much time is spend on analyzing Frames size in memory of String 
Arrays, and actually apply schema use 1 sec. on just that. 
   Does it make sense to make memory estimation parallel? or should it be 
simplified to not be exact in frame cases?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to