andygrove opened a new pull request #7205: URL: https://github.com/apache/arrow/pull/7205
This PR adds a new crate for benchmarks based on the [New York Taxi and Limousine Commission](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page) data set. Currently, only DataFusion benchmarks exist, but the plan is to add benchmarks for the arrow, flight, and parquet crates as well. Example usage: ```bash cargo run --release -- --iterations 3 --path /mnt/nyctaxi/csv --format csv --batch-size 4096 ``` Example output: ```bash Running benchmarks with the following options: Opt { debug: false, iterations: 3, batch_size: 4096, path: "/mnt/nyctaxi/csv", file_format: "csv" } Executing 'fare_amt_by_passenger' Query 'fare_amt_by_passenger' iteration 0 took 7138 ms Query 'fare_amt_by_passenger' iteration 1 took 7599 ms Query 'fare_amt_by_passenger' iteration 2 took 7969 ms ``` Follow-up PRs will add additional functionality, such as: - Storing results in structured files (json perhaps) - Adding a Dockerfile so that the benchmarks can be run with CPU and RAM constraints ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
