paleolimbot opened a new pull request, #509:
URL: https://github.com/apache/arrow-nanoarrow/pull/509

   This PR implements asychronous buffer copying when copying CUDA buffers. 
Before this, we had basically been issuing `cuMemCopyDtoH/HtoD()` a lot of 
times in a row with a synchronize up front and a synchronize at the end. This 
was probably not great for performance. Additionally, for copying 
String/Binary/Large String/Large Binary arrays from CUDA to the CPU, we were 
issuing very tiny copies on the offsets buffer and synchronizing with the CPU 
to get the number of bytes to copy for the data buffer.
   
   After this PR, when copying from CPU to CUDA, we will be able to return 
before the copy is necessarily completed by setting the output `sync_event`.
   
   When copying from CUDA to CPU, the copy is done in one pass if there are no 
string/binary arrays, or two passes if there are. When copying string/binary 
arrays, the implementation walks the entire tree of arrays and issues 
asynchronous copies for the last offset value. Then, the stream is synchronized 
with the CPU, and a second set of asynchronous copies are issued for the 
buffers whose size we now know.
   
   I don't have much experience with CUDA async programming to know if this 
approach could be simplified (e.g., I do this in two streams, but it might be 
that one stream is sufficient since perhaps all of the device -> host copies 
are getting queued against eachother regardless of what stream they're on).
   
   This will be easier to test when (e.g., bigger, non-trivial data) it is 
wired up to Python.
   
   TODO:
   
   - Implement `sync_event` integration (both for source and destination)
   - Test more than just a few string arrays
   
   Closes #245.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to