tustvold commented on issue #1473:
URL: https://github.com/apache/arrow-rs/issues/1473#issuecomment-1102742518

   Ok running on a c2-standard-16 on GCP I get
   
   ```
   sync_file_test (2048): min: 0.0903s, max: 0.0995s, avg: 0.0912s, p95: 0.0916s
   sync_mem_test (2048): min: 0.0922s, max: 0.0952s, avg: 0.0931s, p95: 0.0937s
   par_sync_file_test (2048): min: 0.0852s, max: 0.0897s, avg: 0.0866s, p95: 
0.0878s
   tokio_sync_file_test (2048): min: 0.0858s, max: 0.0897s, avg: 0.0869s, p95: 
0.0879s
   tokio_spawn_file_test (2048): min: 0.1114s, max: 0.1144s, avg: 0.1125s, p95: 
0.1136s
   tokio_spawn_file_buffer_test (2048): min: 0.1641s, max: 0.1689s, avg: 
0.1663s, p95: 0.1683s
   tokio_async_spawn_blocking_test (2048): min: 0.0787s, max: 0.0810s, avg: 
0.0794s, p95: 0.0798s
   tokio_async_blocking_test (2048): min: 0.0887s, max: 0.0917s, avg: 0.0899s, 
p95: 0.0909s
   tokio_par_async_spawn_blocking_test (2048): min: 0.0758s, max: 0.0792s, avg: 
0.0769s, p95: 0.0778s
   tokio_par_async_blocking_test (2048): min: 0.0863s, max: 0.0886s, avg: 
0.0874s, p95: 0.0881s
   tokio_par_sync_test (2048): min: 0.0859s, max: 0.0885s, avg: 0.0871s, p95: 
0.0880s
   ```
   
   We can see that `tokio_async_spawn_blocking_test` is performing slightly 
better than `par_sync_file_test` :tada: This result was repeatable even if I 
moved the parquet file to a `tmpfs` backed filesystem. 
   
   Just to check I haven't just been a muppet, I re-ran on my local machine and 
still get the same behaviour of `tokio_par_async_spawn_blocking_test` being 
significantly slower. I'm not really sure what is going on here, perhaps some 
CPU turbo shenanigans or something...
   
   ```
   sync_file_test (2048): min: 0.0568s, max: 0.1135s, avg: 0.0584s, p95: 0.0592s
   sync_mem_test (2048): min: 0.0582s, max: 0.0653s, avg: 0.0606s, p95: 0.0616s
   par_sync_file_test (2048): min: 0.0558s, max: 0.0603s, avg: 0.0575s, p95: 
0.0584s
   tokio_sync_file_test (2048): min: 0.0559s, max: 0.0605s, avg: 0.0576s, p95: 
0.0581s
   tokio_spawn_file_test (2048): min: 0.0854s, max: 0.0881s, avg: 0.0864s, p95: 
0.0874s
   tokio_spawn_file_buffer_test (2048): min: 0.1163s, max: 0.1460s, avg: 
0.1189s, p95: 0.1201s
   tokio_async_spawn_blocking_test (2048): min: 0.0566s, max: 0.0598s, avg: 
0.0576s, p95: 0.0592s
   tokio_async_blocking_test (2048): min: 0.0599s, max: 0.0649s, avg: 0.0606s, 
p95: 0.0615s
   tokio_par_async_spawn_blocking_test (2048): min: 0.0520s, max: 0.0876s, avg: 
0.0584s, p95: 0.0780s
   tokio_par_async_blocking_test (2048): min: 0.0543s, max: 0.0651s, avg: 
0.0568s, p95: 0.0626s
   tokio_par_sync_test (2048): min: 0.0509s, max: 0.0598s, avg: 0.0526s, p95: 
0.0596s
   ```
   
   Unfortunately comparing the performance of the parquet SQL benchmarks of 
Datafusion master, against 
[parquet-async-wip](https://github.com/tustvold/arrow-datafusion/tree/parquet-async-wip)
 on the GCP instance, there is still a non-trivial performance hit, although it 
is less severe than on my local machine.
   
   ```
   select dict_10_optional from t                                               
                             
                           time:   [9.4814 ms 9.5137 ms 9.5400 ms]
                           change: [+10.989% +11.387% +11.754%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   select dict_100_optional from t                                              
                              
                           time:   [9.5785 ms 9.5907 ms 9.6034 ms]
                           change: [+8.6348% +8.8496% +9.0629%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   select dict_1000_optional from t                                             
                               
                           time:   [9.6751 ms 9.6910 ms 9.7066 ms]
                           change: [+8.3536% +8.6366% +8.9063%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   select count(*) from t where dict_10_required = 'prefix#0'                   
                                                         
                           time:   [8.3035 ms 8.3429 ms 8.3836 ms]
                           change: [+12.266% +14.359% +16.585%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   select count(*) from t where dict_100_required = 'prefix#0'                  
                                                          
                           time:   [8.4855 ms 8.5323 ms 8.5800 ms]
                           change: [+16.358% +17.383% +18.456%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   select count(*) from t where dict_1000_required = 'prefix#0'                 
                                                           
                           time:   [8.9166 ms 8.9591 ms 9.0018 ms]
                           change: [+16.238% +17.416% +18.634%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   select i64_optional from t where dict_10_required = 'prefix#2' and 
dict_1000_required = 'prefix#10'                                                
                            
                           time:   [22.442 ms 22.688 ms 22.946 ms]
                           change: [+21.372% +23.369% +25.354%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   select i64_required from t where dict_10_required = 'prefix#2' and 
dict_1000_required = 'prefix#10'                                                
                            
                           time:   [18.864 ms 19.025 ms 19.194 ms]
                           change: [+1.7532% +3.9070% +6.0920%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   select string_optional from t where dict_10_required = 'prefix#1' and 
dict_1000_required = 'prefix#1...                                               
                             
                           time:   [69.645 ms 70.503 ms 71.486 ms]
                           change: [-0.9622% +0.2143% +1.7061%] (p = 0.77 > 
0.05)
                           No change in performance detected.
   
   select string_required from t where dict_10_required = 'prefix#1' and 
dict_1000_required = 'prefix#1...                                               
                             
                           time:   [103.78 ms 104.87 ms 106.12 ms]
                           change: [+1.0176% +2.2273% +3.5223%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   select distinct dict_10_required from t where dict_1000_optional is not NULL 
and i64_optional > 0                                                            
                
                           time:   [26.913 ms 27.031 ms 27.150 ms]
                           change: [+20.687% +21.646% +22.596%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   select distinct dict_10_required from t where dict_1000_optional is not NULL 
and i64_optional > 0 #2                                                         
                   
                           time:   [26.962 ms 27.079 ms 27.197 ms]
                           change: [+21.371% +22.212% +23.070%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   select distinct dict_10_required from t where dict_1000_optional is not NULL 
and i64_required > 0                                                            
                
                           time:   [24.484 ms 24.595 ms 24.707 ms]
                           change: [+17.505% +18.513% +19.520%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   select distinct dict_10_required from t where dict_1000_optional is not NULL 
and i64_required > 0 #2                                                         
                   
                           time:   [24.540 ms 24.645 ms 24.752 ms]
                           change: [+18.075% +18.954% +19.857%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   select dict_10_optional, count(*) from t group by dict_10_optional           
                                                                 
                           time:   [15.281 ms 15.336 ms 15.392 ms]
                           change: [+19.444% +19.861% +20.336%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   select dict_10_optional, dict_100_optional, count(*) from t group by 
dict_10_optional, dict_100_opti...                                              
                              
                           time:   [35.443 ms 35.511 ms 35.581 ms]
                           change: [+19.094% +19.483% +19.885%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   select dict_10_optional, dict_100_optional, MIN(f64_required), 
MAX(f64_required), AVG(f64_required) ...                                        
                                    
                           time:   [56.518 ms 56.620 ms 56.723 ms]
                           change: [+9.8955% +10.367% +10.835%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   select dict_10_optional, dict_100_optional, MIN(f64_optional), 
MAX(f64_optional), AVG(f64_optional) ...                                        
                                    
                           time:   [62.324 ms 62.449 ms 62.579 ms]
                           change: [+9.8747% +10.485% +11.087%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   select dict_10_required, dict_100_required, MIN(f64_optional), 
MAX(f64_optional), AVG(f64_optional) ...                                        
                                    
                           time:   [57.326 ms 57.434 ms 57.542 ms]
                           change: [+6.9184% +7.5079% +8.0877%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   ```
   
   I need to think a bit further on this, being able to separate IO from decode 
is pretty compelling on various levels if we can do it, but it is unfortunate 
if it comes with a performance regression...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to