pitrou commented on PR #13796:
URL: https://github.com/apache/arrow/pull/13796#issuecomment-1217788792
I'm running this benchmark locally on Ubuntu 20.04, 24-thread CPU, ext4
filesystem on a fast SSD.
Out of curiosity I added different dataset sizes into the mix.
* With `num_files_ = 10000` (total ~1 million files):
```
-------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark
Time CPU Iterations UserCounters...
-------------------------------------------------------------------------------------------------------------------------------------------------------
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:10/real_time
2668 ms 0.872 ms 1 items_per_second=416.072k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:10/real_time
2301 ms 0.857 ms 1 items_per_second=482.503k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:10/real_time
2344 ms 0.784 ms 1 items_per_second=473.564k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:100/real_time
2312 ms 0.799 ms 1 items_per_second=480.162k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:100/real_time
1555 ms 0.717 ms 1 items_per_second=713.993k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:100/real_time
1509 ms 0.698 ms 1 items_per_second=735.592k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:1000/real_time
2676 ms 0.880 ms 1 items_per_second=414.832k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:1000/real_time
634 ms 0.792 ms 1 items_per_second=1.75228M/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:1000/real_time
285 ms 0.764 ms 3 items_per_second=3.89016M/s
```
* With `num_files_ = 1000` (total ~100 thousands files):
```
-------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark
Time CPU Iterations UserCounters...
-------------------------------------------------------------------------------------------------------------------------------------------------------
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:10/real_time
261 ms 0.086 ms 3 items_per_second=425.581k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:10/real_time
186 ms 0.080 ms 4 items_per_second=595.899k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:10/real_time
185 ms 0.088 ms 4 items_per_second=601.398k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:100/real_time
296 ms 0.096 ms 2 items_per_second=375.29k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:100/real_time
66.4 ms 0.081 ms 10 items_per_second=1.6738M/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:100/real_time
29.6 ms 0.067 ms 24 items_per_second=3.74895M/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:1000/real_time
304 ms 0.091 ms 2 items_per_second=365.748k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:1000/real_time
70.2 ms 0.085 ms 10 items_per_second=1.58328M/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:1000/real_time
32.7 ms 0.051 ms 21 items_per_second=3.40023M/s
```
* With `num_files_ = 10` (total ~1000 files):
```
-------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark
Time CPU Iterations UserCounters...
-------------------------------------------------------------------------------------------------------------------------------------------------------
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:10/real_time
4.24 ms 0.021 ms 187 items_per_second=287.682k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:10/real_time
1.87 ms 0.025 ms 352 items_per_second=650.697k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:10/real_time
1.59 ms 0.024 ms 455 items_per_second=769.645k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:100/real_time
6.21 ms 0.026 ms 106 items_per_second=196.586k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:100/real_time
1.86 ms 0.024 ms 364 items_per_second=657.369k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:100/real_time
1.60 ms 0.024 ms 435 items_per_second=762.434k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:1000/real_time
6.18 ms 0.022 ms 110 items_per_second=197.411k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:1000/real_time
1.86 ms 0.025 ms 378 items_per_second=657.252k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:1000/real_time
1.58 ms 0.024 ms 463 items_per_second=774.176k/s
```
* With `num_files_ = 10, num_dirs_ = 1` (total ~30 files):
```
-------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark
Time CPU Iterations UserCounters...
-------------------------------------------------------------------------------------------------------------------------------------------------------
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:10/real_time
0.105 ms 0.015 ms 6795 items_per_second=304.24k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:10/real_time
0.133 ms 0.021 ms 5239 items_per_second=240.955k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:10/real_time
0.133 ms 0.021 ms 5241 items_per_second=240.891k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:100/real_time
0.157 ms 0.018 ms 4352 items_per_second=203.893k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:100/real_time
0.131 ms 0.021 ms 5290 items_per_second=244.757k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:100/real_time
0.138 ms 0.022 ms 5471 items_per_second=231.11k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:1/file_info_batch_size:1000/real_time
0.158 ms 0.018 ms 4294 items_per_second=201.957k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:4/file_info_batch_size:1000/real_time
0.131 ms 0.022 ms 5323 items_per_second=243.422k/s
LocalFSFixture/AsyncFileDiscovery/directory_readahead:16/file_info_batch_size:1000/real_time
0.134 ms 0.021 ms 5188 items_per_second=239.693k/s
```
These numbers seem to support a default readahead of 16 and a default batch
size of 1000.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]