westonpace commented on PR #41801:
URL: https://github.com/apache/arrow/pull/41801#issuecomment-2129641389

   > We could add a ReadSome equivalent, but we should first ensure it's 
actually useful to us - is it?
   
   My guess would be no, in most situations this will not make a measurable 
difference.  The theory behind the call is that it can prevent an allocation + 
memcpy.  For example, a TCP library might allocate a buffer that it then fills 
from the NIC, maybe its a 1MB buffer.  If you then demand 4MB of data the 
library is forced to allocate a new 4MB buffer and do a memcpy.  You can use 
the two-arg version of read to avoid the allocation but you still have the 
memcpy.  With `ReadSome` it simply returns the 1MB buffer in a zero-copy 
fashion.
   
   I believe this is unlikely to be a significant cost, especially if the user 
is quickly using the data returned from `Read`.  Because then we are just 
talking about an extra scan through data that's in the CPU cache and not a 
separate transfer across the RAM-CPU bus.
   
   On the other hand, if the user is calling `Read` and then just stashing the 
buffer somewhere and then using that data in a later thread then the cost might 
be more significant because that means we need to copy the data from RAM to CPU 
once (to do the memcpy in `Read`) and then do it a second time (once the user 
actually accesses the buffer in a different thread) and `ReadSome` could be a 
way to work around this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to