westonpace commented on PR #41801: URL: https://github.com/apache/arrow/pull/41801#issuecomment-2129641389
> We could add a ReadSome equivalent, but we should first ensure it's actually useful to us - is it? My guess would be no, in most situations this will not make a measurable difference. The theory behind the call is that it can prevent an allocation + memcpy. For example, a TCP library might allocate a buffer that it then fills from the NIC, maybe its a 1MB buffer. If you then demand 4MB of data the library is forced to allocate a new 4MB buffer and do a memcpy. You can use the two-arg version of read to avoid the allocation but you still have the memcpy. With `ReadSome` it simply returns the 1MB buffer in a zero-copy fashion. I believe this is unlikely to be a significant cost, especially if the user is quickly using the data returned from `Read`. Because then we are just talking about an extra scan through data that's in the CPU cache and not a separate transfer across the RAM-CPU bus. On the other hand, if the user is calling `Read` and then just stashing the buffer somewhere and then using that data in a later thread then the cost might be more significant because that means we need to copy the data from RAM to CPU once (to do the memcpy in `Read`) and then do it a second time (once the user actually accesses the buffer in a different thread) and `ReadSome` could be a way to work around this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
