janniklinde commented on PR #2376:
URL: https://github.com/apache/systemds/pull/2376#issuecomment-3669261021

   LGTM @mboehm7 - one possible concern I have both with the single-threaded 
and multi-threaded implementation is that `clen == 1` has to guarantee that 
`in.getDenseBlockValues()` does not throw an exception or returns only the 
first row. Because the single-threaded case already had that implementation, I 
am not sure if it is intentional or an issue.
   
   Thank you again for your contribution, bugfixes, and detailed tests and 
benchmarks @Biranavan-Parameswaran. 
   
   It appears that `System.arraycopy` already saturates memory bandwidth (shown 
by your single column benchmarks). An optional improvement I could see for 
`DenseBlockFP64` is to always perform two large arraycopies instead of many 
small ones (currently this is only limited to `clen==1`). Due to the contiguous 
row-major layout of some dense block implementations, performing two large 
arraycopies should be possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to