Re: [PR] Update arrow_reader_row_filter benchmark to reflect ClickBench distribution [arrow-rs]

via GitHub Fri, 02 May 2025 13:24:41 -0700


alamb commented on PR #7461:
URL: https://github.com/apache/arrow-rs/pull/7461#issuecomment-2848051135


   > > > Thank you @alamb for this work, so we still need to investigate more. 
Is it possible that we can get a 10% data set from hit.parquet and do some 
benchmark from arrow-rs side.
   > > 
   > > 
   > > That is an interesting idea -- to make make a benchmark in arrow-rs that 
runs against hits.parquet (and hits_partitioned) directly 🤔 (and e.g. could 
require downloading those files before running).
   > 
   > I am trying to do the first step, may be we can download a partition 
hit.parquet, and pick it as the data set to arrow-rs, because we have 100 
partition file, it seems about %1 data which can be mocked.
   
   I have been thinking about it too -- I probably won't have a chance this 
weekend but can work on it next week 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Update arrow_reader_row_filter benchmark to reflect ClickBench distribution [arrow-rs]

Reply via email to