kev-inn opened a new pull request, #1643:
URL: https://github.com/apache/systemds/pull/1643

   # `DoubleVector` replacement for matrix multiply
   JDK 17 adds `Vector` classes to use SIMD instructions. This PR replaces the 
basic dense dense matrix multiply with an equivalent `DoubleVector` 
implementation. It is necessary to use JDK 17, therefore we should not merge 
this yet, but keep it in staging for future reference.
   
   As an experiment we check a simple matrix multiply:
   `Z = X %*% Y`, $X\in \mathbb{R}^{n\times k}, Y\in \mathbb{R}^{k\times m}$
   
   The experiment script performs 10 matrix multiplications and saves the time 
of the last 5 to give the JVM some time to optimize.
   
   ## Vary rows n, m fixed at 1000
   
   ### Alpha Node
   | $k = 1000$ | $k = 10000$ |
   | --------------- | --------------- |
   | 
![plot_alpha_n_1000](https://user-images.githubusercontent.com/41760497/174668244-dec90cb2-4ded-4e8a-9b9a-16408c818809.svg)
 | 
![plot_alpha_n_10000](https://user-images.githubusercontent.com/41760497/174668238-c9f6b988-6b55-4c1f-845c-399ad6bf1ed0.svg)
 |
   
   
   ### Lima Node
   | $k = 1000$ | $k = 10000$ |
   | --------------- | --------------- |
   | 
![plot_lima_n_1000](https://user-images.githubusercontent.com/41760497/174668254-4fb11b22-b6b3-45c4-883c-572f469ec9b9.svg)
 | 
![plot_lima_n_10000](https://user-images.githubusercontent.com/41760497/174668248-2a6d2b1c-d4b7-4296-9fed-ed3fed8d84d1.svg)
 |
   
   ## Vary cols m, n fixed at 1000
   
   ### Alpha Node
   | $k = 1000$ | $k = 10000$ |
   | --------------- | --------------- |
   | 
![plot_alpha_m_1000](https://user-images.githubusercontent.com/41760497/174668241-3a829378-525b-464e-bf4e-9d1eff373125.svg)
 | 
![plot_alpha_m_10000](https://user-images.githubusercontent.com/41760497/174668233-7c5120a1-b243-4fa6-bf3a-2e0c05d75308.svg)
 |
   
   ### Lima Node
   | $k = 1000$ | $k = 10000$ |
   | --------------- | --------------- |
   | 
![plot_lima_m_1000](https://user-images.githubusercontent.com/41760497/174668252-8fefdd71-dd72-4d24-a851-eea6f660b88d.svg)
 | 
![plot_lima_m_10000](https://user-images.githubusercontent.com/41760497/174668246-0824fbae-900f-49a9-9010-0edb8ffbc7fd.svg)
 |
   
   ## Conclusion
   The implementation seems to boost the performance in most cases. The case 
where we vary the number of columns n on the alpha node needs some more 
exploration, but it seems we are never worse than the current implementation.
   
   ## Experiment Script
   
   ```R
   X = read($Xfname);
   Y = read($Yfname);
   
   lim = 10;
   R = matrix(0, rows=lim, cols=1);
   for (i in 1:lim) {
     t1 = time();
     Z = X %*% Y;
     t2 = time();
     R[i,1] = (t2-t1)/1000000;
   }
   
   print(sum(Z));
   res = R[5:lim,];
   write(res, $fname, format="csv", sep="\t");
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to