[GitHub] [arrow] westonpace commented on pull request #11616: ARROW-14577: [C++] Enable fine grained IO for async IPC reader

GitBox Thu, 13 Jan 2022 19:45:04 -0800


westonpace commented on pull request #11616:
URL: https://github.com/apache/arrow/pull/11616#issuecomment-1012723649



   I split the benchmarks into a separate PR (#12150).  I did a bit more 
analysis today.  There is a substantial performance loss in a few situations:
   
   Async full file reads (e.g. when reading all columns): David's suggestion 
will probably work here but it's going to be a bit tricky to implement.  Right 
now each time we load a shared record batch we are using a dedicated read cache 
so there is no single read cache to mark "cache" on the whole file.  I plan to 
look at this more tomorrow.
   
   Reading from a buffer (e.g. when doing no I/O at all).  For example:
   
   Old:
   ```
   ReadBufferAsync/num_cols:1/is_partial:0/iterations:50000/real_time_mean      
   3623 ns         3623 ns           10 bytes_per_second=269.542G/s
   ```
   New:
   ```
   ReadBufferAsync/num_cols:1/is_partial:0/iterations:50000/real_time_mean      
   8651 ns         8651 ns           10 bytes_per_second=112.89G/s
   ```
   
   In this particular case we are over 2x slower.  Some of this slowdown is 
because the read range cache is calling "file->WillNeed" on these regions which 
triggers an madvise (which seems to mainly eat up time purely by virtue of 
being a system call).  Removing that call gets us to `159G/s` although I'm not 
really sure if that's the right path to take.
   
   I'm pretty sure the rest of the time is lost because we are using more 
futures which means more allocation and shared_ptr.  There is no quick fix for 
that but I am thinking I want to tackle Future improvements in 8.0.0.
   
   There's a Windows build error I will fix.
   
   At the moment I am leaning towards including this but kind of split.  The 
slowdowns are on an already lightning fast path (e.g. we are going from 4000ns 
to 8000ns for a zero-copy buffer read) for an operation we aren't yet calling 
in any real critical section (these calls are per-batch).
   
   The speedup is on a very slow path (e.g. going from 7.8 seconds to 1.7 
seconds on 1G file read because we're reading 8 columns instead of 64 columns) 
but maybe not as common of one for IPC.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace commented on pull request #11616: ARROW-14577: [C++] Enable fine grained IO for async IPC reader

Reply via email to