corepointer opened a new pull request #1567: URL: https://github.com/apache/systemds/pull/1567
…fers The spoof cuda operators do several little cudaMemcpy() invocations per operator execution. By transferring all data in one go the overhead can be reduced. In addition, using asynchronous copies can further improve things and are a first step towards using more asynchronicity in the GPU operations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
