lidavidm edited a comment on pull request #9656:
URL: https://github.com/apache/arrow/pull/9656#issuecomment-811445456


   I've rebased this to use the background generator, however, it doesn't help 
much, and it makes us non-reentrant, so we also lose any advantage with 
compressed data as we can't parallelize the decompression anymore.
   
   The async reader gets anywhere from 30-90% of the throughput of the 
synchronous one.
   
   Cases here are numbered by the number of columns in the file. The cases with 
very few columns are a worst case for async, since decoding is basically 0-cost 
and async is purely overhead. Conversely the cases with many columns are a best 
case, since decoding is expensive. However async doesn't help because I/O is 
relatively cheap in all cases benchmarked here and there is no pipelining to be 
had.
   
   Frankly, the fastest approach I tested was to just wrap the synchronous 
reader in a Future and block the caller, which isn't encouraging. A flamegraph 
shows that using the thread pool for decoding work is still rather expensive, 
and so it might be better if we used something like the background generator 
for that as well. In that case it would be convenient if we could somehow pull 
directly from the background generator's queue instead of having to get and 
block on futures; also this still means we can't get any benefit from 
parallelizing decompression if needed. For datasets with files >= cores that's 
probably not a big deal if you only care about throughput (we'll still decode 
in parallel) but if you need results in order and/or you have few files 
relative to cores then it won't be optimal.
   
   You may question why in-memory (ReadFile) is slower than a temp file 
(ReadTempFile). In the flamegraphs, the culprit appears to be BufferReader's 
use of MemoryAdviseWillNeed, which spends a significant amount of time in the 
kernel. Removing it improves performance drastically.
   
   ```
   
-------------------------------------------------------------------------------------------------
   Benchmark                                       Time             CPU   
Iterations UserCounters...
   
-------------------------------------------------------------------------------------------------
   ReadFile/1/real_time                         7858 ns         7858 ns        
85629 bytes_per_second=124.269G/s
   ReadFile/4/real_time                        10698 ns        10698 ns        
64406 bytes_per_second=91.2852G/s
   ReadFile/16/real_time                       21661 ns        21661 ns        
32684 bytes_per_second=45.0839G/s
   ReadFile/64/real_time                       67470 ns        67470 ns        
10406 bytes_per_second=14.4741G/s
   ReadFile/256/real_time                     275292 ns       275282 ns         
2553 bytes_per_second=3.54738G/s
   ReadFile/1024/real_time                   1071125 ns      1071065 ns         
 652 bytes_per_second=933.598M/s
   ReadFile/4096/real_time                   4245107 ns      4245052 ns         
 165 bytes_per_second=235.565M/s
   ReadFile/8192/real_time                   8157924 ns      8157957 ns         
  85 bytes_per_second=122.58M/s
   ReadFileAsync/1/real_time                   23883 ns         7835 ns        
29390 bytes_per_second=40.8887G/s
   ReadFileAsync/4/real_time                   27242 ns         9040 ns        
25836 bytes_per_second=35.8478G/s
   ReadFileAsync/16/real_time                  40988 ns        14562 ns        
17154 bytes_per_second=23.8253G/s
   ReadFileAsync/64/real_time                  93104 ns        33633 ns         
7334 bytes_per_second=10.489G/s
   ReadFileAsync/256/real_time                303852 ns       116901 ns         
2313 bytes_per_second=3.21394G/s
   ReadFileAsync/1024/real_time              1430233 ns       531043 ns         
 546 bytes_per_second=699.187M/s
   ReadFileAsync/4096/real_time              4589980 ns      1895584 ns         
 153 bytes_per_second=217.866M/s
   ReadFileAsync/8192/real_time              8793373 ns      3865574 ns         
  82 bytes_per_second=113.722M/s
   ReadTempFile/1/real_time                    70972 ns        70936 ns         
9712 bytes_per_second=220.157G/s
   ReadTempFile/4/real_time                    74053 ns        74022 ns         
9243 bytes_per_second=210.997G/s
   ReadTempFile/16/real_time                   85777 ns        85749 ns         
8100 bytes_per_second=182.158G/s
   ReadTempFile/64/real_time                  132803 ns       132783 ns         
5331 bytes_per_second=117.656G/s
   ReadTempFile/256/real_time                 333974 ns       333967 ns         
2093 bytes_per_second=46.785G/s
   ReadTempFile/1024/real_time               1131198 ns      1131179 ns         
 607 bytes_per_second=13.8128G/s
   ReadTempFile/4096/real_time               4330575 ns      4330568 ns         
 161 bytes_per_second=3.60807G/s
   ReadTempFile/8192/real_time               8270275 ns      8270100 ns         
  85 bytes_per_second=1.8893G/s
   ReadTempFileAsync/1/real_time               88569 ns        12731 ns         
7814 bytes_per_second=176.417G/s
   ReadTempFileAsync/4/real_time               94127 ns        14422 ns         
7477 bytes_per_second=165.998G/s
   ReadTempFileAsync/16/real_time             104455 ns        20203 ns         
6652 bytes_per_second=149.586G/s
   ReadTempFileAsync/64/real_time             158604 ns        38862 ns         
4443 bytes_per_second=98.516G/s
   ReadTempFileAsync/256/real_time            372728 ns       122446 ns         
1831 bytes_per_second=41.9207G/s
   ReadTempFileAsync/1024/real_time          1347728 ns       485078 ns         
 520 bytes_per_second=11.5936G/s
   ReadTempFileAsync/4096/real_time          4649311 ns      1930484 ns         
 151 bytes_per_second=3.36071G/s
   ReadTempFileAsync/8192/real_time          8773800 ns      3815852 ns         
  80 bytes_per_second=1.78087G/s
   ReadCompressedFile/1/real_time           30636840 ns      1421583 ns         
  23 bytes_per_second=522.247M/s
   ReadCompressedFile/4/real_time            9529811 ns       628655 ns         
  65 bytes_per_second=1.63959G/s
   ReadCompressedFile/16/real_time           5673642 ns      1863531 ns         
 122 bytes_per_second=2.75396G/s
   ReadCompressedFile/64/real_time           8372634 ns      6633169 ns         
  84 bytes_per_second=1.8662G/s
   ReadCompressedFile/256/real_time         22590210 ns     21607133 ns         
  28 bytes_per_second=708.271M/s
   ReadCompressedFile/1024/real_time        84274350 ns     81412117 ns         
   9 bytes_per_second=189.856M/s
   ReadCompressedFile/4096/real_time       330157333 ns    317542733 ns         
   2 bytes_per_second=48.4617M/s
   ReadCompressedFile/8192/real_time       648075491 ns    627804731 ns         
   1 bytes_per_second=24.6885M/s
   ReadCompressedFileAsync/1/real_time      57512529 ns      1849864 ns         
   9 bytes_per_second=278.2M/s
   ReadCompressedFileAsync/4/real_time       9702801 ns       553906 ns         
  71 bytes_per_second=1.61036G/s
   ReadCompressedFileAsync/16/real_time      6001873 ns      1765858 ns         
 114 bytes_per_second=2.60335G/s
   ReadCompressedFileAsync/64/real_time      8414578 ns      6398791 ns         
  81 bytes_per_second=1.8569G/s
   ReadCompressedFileAsync/256/real_time    22844448 ns     20703843 ns         
  30 bytes_per_second=700.389M/s
   ReadCompressedFileAsync/1024/real_time   83260767 ns     75605439 ns         
   8 bytes_per_second=192.167M/s
   ReadCompressedFileAsync/4096/real_time  329809506 ns    298760917 ns         
   2 bytes_per_second=48.5129M/s
   ReadCompressedFileAsync/8192/real_time  643886356 ns    584995701 ns         
   1 bytes_per_second=24.8491M/s
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to