lidavidm commented on pull request #10060:
URL: https://github.com/apache/arrow/pull/10060#issuecomment-821321573


   And a quick comparison (in Python) for IPC files:
   
   ```
   # Without special CountRows method
   count_rows
   43.33s 43.32s 43.24s 43.41s 43.43s 
   Mean  : 43.35s
   Median: 43.33s
   count_rows (parallel)
   16.21s 16.22s 16.25s 16.17s 16.19s 
   Mean  : 16.21s
   Median: 16.21s
   
   # With special CountRows method
   count_rows
   16.96s 6.90s 7.23s 7.28s 7.28s 
   Mean  : 9.13s
   Median: 7.28s
   count_rows (parallel)
   7.32s 7.20s 7.30s 7.26s 7.32s 
   Mean  : 7.28s
   Median: 7.30s
   
   # Baseline for comparison
   count_fragments: sum(fragment.to_table(columns=[]).num_rows for fragment in 
ds.get_fragments())
   47.26s 44.03s 44.00s 42.70s 43.18s 
   Mean  : 44.23s
   Median: 44.00s
   count_files: sum(pyarrow.feather.read_table(filename, columns=[]).num_rows 
for filename in ds.files)
   19.05s 19.22s 19.95s 19.82s 19.91s 
   Mean  : 19.59s
   Median: 19.82s
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to