corepointer opened a new pull request #1567:
URL: https://github.com/apache/systemds/pull/1567


   …fers
   
   The spoof cuda operators do several little cudaMemcpy() invocations per 
operator execution. By transferring all data in one go the overhead can be 
reduced. In addition, using asynchronous copies can further improve things and 
are a first step towards using more asynchronicity in the GPU operations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to