andygrove opened a new pull request #7205:
URL: https://github.com/apache/arrow/pull/7205


   This PR adds a new crate for benchmarks based on the [New York Taxi and 
Limousine 
Commission](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page) data 
set.
   
   Currently, only DataFusion benchmarks exist, but the plan is to add 
benchmarks for the arrow, flight, and parquet crates as well.
   
   Example usage:
   
   ```bash
   cargo run --release -- --iterations 3 --path /mnt/nyctaxi/csv --format csv 
--batch-size 4096
   ```
   
   Example output:
   
   ```bash
   Running benchmarks with the following options: Opt { debug: false, 
iterations: 3, batch_size: 4096, path: "/mnt/nyctaxi/csv", file_format: "csv" }
   Executing 'fare_amt_by_passenger'
   Query 'fare_amt_by_passenger' iteration 0 took 7138 ms
   Query 'fare_amt_by_passenger' iteration 1 took 7599 ms
   Query 'fare_amt_by_passenger' iteration 2 took 7969 ms
   ```
   
   Follow-up PRs will add additional functionality, such as:
   
   - Storing results in structured files (json perhaps)
   - Adding a Dockerfile so that the benchmarks can be run with CPU and RAM 
constraints


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to