[GitHub] [systemds] Baunsgaard opened a new pull request, #1801: Many Optimizations of CLA Compressed transfer

via GitHub Sat, 01 Apr 2023 14:59:27 -0700


Baunsgaard opened a new pull request, #1801:
URL: https://github.com/apache/systemds/pull/1801


   
   
   Transpose Denset-> sparse:
   
   Before: single thread transpose dominated:
   ```
   census_enc_16k-kmeans+-claWorkloadb16 -- dams-so001
   Total elapsed time:          40.237 sec.
     2  r'                15.085    112
     3  compress           6.896      1
           407,390.46 msec task-clock                #    8.943 CPUs utilized   
       
      789,013,767,869      cycles                    #    1.937 GHz             
         (33.31%)
      938,096,650,617      instructions              #    1.19  insn per cycle  
       
   census_enc_16k-kmeans+-claWorkloadb16 -- dams-so001
   ```
   
   Then i removed an indirection of allocation via append on MCSR and managed 
the sparse vectors directly:
   
   ```
   dams-so001 sysds: 81e554108686a1db2ff48ecd59e81d533d216b07
   20:58:10
   .------------------------------------
   census_enc_16k-kmeans+-claWorkloadb16 -- dams-so001
   Total elapsed time:          32.991 sec.
     2  r'                 9.334    112
     3  compress           6.669      1
           399,243.07 msec task-clock                #   10.375 CPUs utilized   
       
      762,584,081,959      cycles                    #    1.910 GHz             
         (33.28%)
      928,305,632,331      instructions              #    1.22  insn per cycle  
       
   census_enc_16k-kmeans+-claWorkloadb16 -- dams-so001
   ------------------------------------
   
   ```
   
   And finally parallelized:
   
   ```
   dams-so001 sysds: 3a43b30b2b8dec983aa5cd7ea3c67c79a28b7f30
   21:49:14
   .------------------------------------
   census_enc_16k-kmeans+-claWorkloadb16 -- dams-so001
   Total elapsed time:          27.812 sec.
     2  compress           6.801      1
     4  r'                 4.454    112
           405,710.80 msec task-clock                #   12.203 CPUs utilized   
       
      777,778,027,253      cycles                    #    1.917 GHz             
         (33.34%)
      967,872,124,119      instructions              #    1.24  insn per cycle  
       
   census_enc_16k-kmeans+-claWorkloadb16 -- dams-so001
   ------------------------------------
   baunsgaard@dams-so001:~/github/reprodu
   ```
   
   
   In LMCG: 16x I found some optimizations to make as well. Here I added a skip 
list to offsetList that is hidden behind a softreference.
   
   Before:
   ```
   SystemDS Statistics:
   Total elapsed time:          108.736 sec.
   CLA Compression Phases :     2.065/8.329/18.484/15.647/0.001/0.000
   Decompression with allocation (Single, Multi, Spark, Cache) : 0/101/0/0
   Decompression with allocation Time (Single , Multi)         : 0.000/27.285 
sec.
   Decompression to block (Single, Multi)                      : 0/0
   Decompression to block Time (Single, Multi)                 : 0.000/0.000 
sec.
   ```
   
   With SkipList:
   
   ```
   SystemDS Statistics:
   Total elapsed time:          91.367 sec.
   Total compilation time:              1.250 sec.
   
   CLA Compression Phases :     2.154/7.847/20.251/15.372/0.002/0.000
   Decompression with allocation (Single, Multi, Spark, Cache) : 0/101/0/0
   Decompression with allocation Time (Single , Multi)         : 0.000/8.685 
sec.
   Decompression to block (Single, Multi)                      : 0/0
   Decompression to block Time (Single, Multi)                 : 0.000/0.000 
sec.
   ```
   
   
   
   And Biggest in transfer from spark i had misplaced a recompute zeros 
dominating the transfer by recomputing all zeros in the entire matrix when 
pulling back a distributed compressed matrix. This speed up for instance PCA 
32x from :+1: 
   
   before
   ```
   Total elapsed time:          148.328 sec.
   Spark trans times (par,bc,col):      0.000/0.021/85.999 secs.
   ```
   
   after
   ```
   Total elapsed time:          65.700 sec.
   Spark trans times (par,bc,col):      0.000/0.021/6.980 secs.
   ```
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [systemds] Baunsgaard opened a new pull request, #1801: Many Optimizations of CLA Compressed transfer

Reply via email to