gopidesupavan commented on PR #62232:
URL: https://github.com/apache/airflow/pull/62232#issuecomment-3939582305

   Benchmarks:
   Though we already have datafusion benchamarks here 
https://datafusion.apache.org/blog/2024/11/18/datafusion-fastest-single-node-parquet-clickbench/
 with tpch.
   
   I have tried using trips yello_taxi data with 44.5M records.  and run these 
agregation queries on the data that stored in s3. it took 1 minute 47s
   
   ```
   """SELECT COUNT(*) AS total_trips
   FROM yellow_taxi;""",
   
       """SELECT SUM(total_amount) AS total_revenue
   FROM yellow_taxi;""",
   
       """SELECT
       AVG(trip_distance) AS avg_distance_miles,
       AVG(fare_amount) AS avg_fare
   FROM yellow_taxi;""",
   
       """SELECT
       MIN(tip_amount) AS min_tip,
       MAX(tip_amount) AS max_tip,
       AVG(tip_amount) AS avg_tip
   FROM yellow_taxi
   WHERE payment_type = 1;""",
   
       """SELECT
       DATE_TRUNC('day', tpep_pickup_datetime) AS pickup_day,
       COUNT(*) AS trips,
       AVG(passenger_count) AS avg_passengers
   FROM yellow_taxi
   GROUP BY pickup_day
   ORDER BY pickup_day DESC;""",
   
       """SELECT
       DATE_TRUNC('month', tpep_pickup_datetime) AS pickup_month,
       SUM(fare_amount) AS total_fares,
       SUM(tolls_amount) AS total_tolls
   FROM yellow_taxi
   GROUP BY pickup_month
   ORDER BY pickup_month DESC;"""
   ```
   
   <img width="1164" height="827" alt="image" 
src="https://github.com/user-attachments/assets/ed39ffbf-26a7-4a41-8efa-bc50c59b5dcc";
 />
   
   
   With HITL approval operator the results can be viewed before the next task:
   
   <img width="1055" height="810" alt="image" 
src="https://github.com/user-attachments/assets/9b260e2d-8219-409b-a631-7abf3eae040f";
 />
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to