lidavidm commented on pull request #10060: URL: https://github.com/apache/arrow/pull/10060#issuecomment-821321573
And a quick comparison (in Python) for IPC files: ``` # Without special CountRows method count_rows 43.33s 43.32s 43.24s 43.41s 43.43s Mean : 43.35s Median: 43.33s count_rows (parallel) 16.21s 16.22s 16.25s 16.17s 16.19s Mean : 16.21s Median: 16.21s # With special CountRows method count_rows 16.96s 6.90s 7.23s 7.28s 7.28s Mean : 9.13s Median: 7.28s count_rows (parallel) 7.32s 7.20s 7.30s 7.26s 7.32s Mean : 7.28s Median: 7.30s # Baseline for comparison count_fragments: sum(fragment.to_table(columns=[]).num_rows for fragment in ds.get_fragments()) 47.26s 44.03s 44.00s 42.70s 43.18s Mean : 44.23s Median: 44.00s count_files: sum(pyarrow.feather.read_table(filename, columns=[]).num_rows for filename in ds.files) 19.05s 19.22s 19.95s 19.82s 19.91s Mean : 19.59s Median: 19.82s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
