pitrou commented on PR #13796:
URL: https://github.com/apache/arrow/pull/13796#issuecomment-1217788792

   I'm running this benchmark locally on Ubuntu 20.04, 24-thread CPU, ext4 
filesystem on a fast SSD.
   Out of curiosity I added different dataset sizes into the mix.
   
   * With `num_files_ = 10000` (total ~1 million files):
   ```
   
-------------------------------------------------------------------------------------------------------------------------------------------------------
   Benchmark                                                                    
                         Time             CPU   Iterations UserCounters...
   
-------------------------------------------------------------------------------------------------------------------------------------------------------
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:10/real_time
          2668 ms        0.872 ms            1 items_per_second=416.072k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:10/real_time
          2301 ms        0.857 ms            1 items_per_second=482.503k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:10/real_time
         2344 ms        0.784 ms            1 items_per_second=473.564k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:100/real_time
         2312 ms        0.799 ms            1 items_per_second=480.162k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:100/real_time
         1555 ms        0.717 ms            1 items_per_second=713.993k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:100/real_time
        1509 ms        0.698 ms            1 items_per_second=735.592k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:1000/real_time
        2676 ms        0.880 ms            1 items_per_second=414.832k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:1000/real_time
         634 ms        0.792 ms            1 items_per_second=1.75228M/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:1000/real_time
        285 ms        0.764 ms            3 items_per_second=3.89016M/s
   ```
   * With `num_files_ = 1000` (total ~100 thousands files):
   ```
   
-------------------------------------------------------------------------------------------------------------------------------------------------------
   Benchmark                                                                    
                         Time             CPU   Iterations UserCounters...
   
-------------------------------------------------------------------------------------------------------------------------------------------------------
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:10/real_time
           261 ms        0.086 ms            3 items_per_second=425.581k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:10/real_time
           186 ms        0.080 ms            4 items_per_second=595.899k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:10/real_time
          185 ms        0.088 ms            4 items_per_second=601.398k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:100/real_time
          296 ms        0.096 ms            2 items_per_second=375.29k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:100/real_time
         66.4 ms        0.081 ms           10 items_per_second=1.6738M/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:100/real_time
        29.6 ms        0.067 ms           24 items_per_second=3.74895M/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:1000/real_time
         304 ms        0.091 ms            2 items_per_second=365.748k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:1000/real_time
        70.2 ms        0.085 ms           10 items_per_second=1.58328M/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:1000/real_time
       32.7 ms        0.051 ms           21 items_per_second=3.40023M/s
   ```
   * With `num_files_ = 10` (total ~1000 files):
   ```
   
-------------------------------------------------------------------------------------------------------------------------------------------------------
   Benchmark                                                                    
                         Time             CPU   Iterations UserCounters...
   
-------------------------------------------------------------------------------------------------------------------------------------------------------
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:10/real_time
          4.24 ms        0.021 ms          187 items_per_second=287.682k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:10/real_time
          1.87 ms        0.025 ms          352 items_per_second=650.697k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:10/real_time
         1.59 ms        0.024 ms          455 items_per_second=769.645k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:100/real_time
         6.21 ms        0.026 ms          106 items_per_second=196.586k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:100/real_time
         1.86 ms        0.024 ms          364 items_per_second=657.369k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:100/real_time
        1.60 ms        0.024 ms          435 items_per_second=762.434k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:1000/real_time
        6.18 ms        0.022 ms          110 items_per_second=197.411k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:1000/real_time
        1.86 ms        0.025 ms          378 items_per_second=657.252k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:1000/real_time
       1.58 ms        0.024 ms          463 items_per_second=774.176k/s
   ```
   * With `num_files_ = 10, num_dirs_ = 1` (total ~30 files):
   ```
   
-------------------------------------------------------------------------------------------------------------------------------------------------------
   Benchmark                                                                    
                         Time             CPU   Iterations UserCounters...
   
-------------------------------------------------------------------------------------------------------------------------------------------------------
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:10/real_time
         0.105 ms        0.015 ms         6795 items_per_second=304.24k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:10/real_time
         0.133 ms        0.021 ms         5239 items_per_second=240.955k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:10/real_time
        0.133 ms        0.021 ms         5241 items_per_second=240.891k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:100/real_time
        0.157 ms        0.018 ms         4352 items_per_second=203.893k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:100/real_time
        0.131 ms        0.021 ms         5290 items_per_second=244.757k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:100/real_time
       0.138 ms        0.022 ms         5471 items_per_second=231.11k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:1000/real_time
       0.158 ms        0.018 ms         4294 items_per_second=201.957k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:1000/real_time
       0.131 ms        0.022 ms         5323 items_per_second=243.422k/s
   
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:1000/real_time
      0.134 ms        0.021 ms         5188 items_per_second=239.693k/s
   ```
   
   These numbers seem to support a default readahead of 16 and a default batch 
size of 1000.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to