ajayky-os commented on PR #14333: URL: https://github.com/apache/iceberg/pull/14333#issuecomment-3628508658
> > Ran TPC-DS and TP-CH suite on a spark cluster to validate the functionality. > > @prudhvimaharishi could you share the results if possible? We are working on verifying the scan time improvements, micro benchmark we have documented now at https://github.com/GoogleCloudPlatform/gcs-analytics-core?tab=readme-ov-file#micro-benchmarks. For end to end query execution time, we did 2 set of benchmarks: 1. (gcs-analytics-core enabled, vectored read enabled) vs (gcs-analytics-core disabled and vectored read disabled) 2. (gcs-analytics-core enabled, vectored read enabled) vs (gcs-analytics-core disabled and vectored read enabled) E2E benchmarking setup: - 3 iterations of TPCDS and TPCH queries were run in spark job, sparkMeasure was used to capture stats like CPU time, gc time, etc. - E2E is end to end time for running the spark sql query. - -ve percentage change means gcs-analytics-core performed better. **(gcs-analytics-core enabled, vectored read enabled) vs (gcs-analytics-core disabled and vectored read disabled)** Median of 3 iterations: |Schema Size|% Change E2E Time|% Change Executor Cpu Time|% Change Jvm Gc Time|% Change Shuffle Fetch Wait Time| |---|---|---|---|---| |tpcds_sf10|-4.81|5.50|28.31|36.51| |tpcds_sf100|-4.33|38.60|78.18|60.40| |tpcds_sf1000|-6.59|9.63|95.48|204.39| |tpch_sf10|-1.76|-4.68|49.11|42.96| |tpch_sf100|-2.73|-5.27|113.64|73.02| |tpch_sf1000|-7.87|-4.08|64.41|89.27| Average of 3 iterations: |Schema Size|% Change E2E Time|% Change Executor Cpu Time|% Change Jvm Gc Time|% Change Shuffle Fetch Wait Time| |---|---|---|---|---| |tpcds_sf10|-4.14|5.49|40.84|150.38| |tpcds_sf100|-3.45|6.67|85.25|140.36| |tpcds_sf1000|-4.92|8.46|100.96|113.59| |tpch_sf10|-2.34|-4.53|41.88|95.81| |tpch_sf100|-2.68|-5.10|104.57|298.36| |tpch_sf1000|-6.79|-4.66|84.59|102.29| **(gcs-analytics-core enabled, vectored read enabled) vs (gcs-analytics-core disabled and vectored read enabled)** Median of 3 iterations |Schema Size|% Change E2E Time|% Change Executor Cpu Time|% Change Jvm Gc Time|% Change Shuffle Fetch Wait Time| |---|---|---|---|---| |tpcds_sf10|-7.66|34.07|5.94|-27.07| |tpcds_sf100|-0.81|4701.41|20.89|15.47| |tpcds_sf1000|-8.93|38.85|33.71|35.10| |tpch_sf10|-2.21|-8.04|-17.37|-3.11| |tpch_sf100|-1.40|-5.30|9.73|-3.21| |tpch_sf1000|6.73|-6.28|17.29|647.98| Average of 3 iterations: |Schema Size|% Change E2E Time|% Change Executor Cpu Time|% Change Jvm Gc Time|% Change Shuffle Fetch Wait Time| |---|---|---|---|---| |tpcds_sf10|-8.14|28.59|18.44|56.34| |tpcds_sf100|-1.28|4051.78|48.83|89.41| |tpcds_sf1000|-4.10|30301.87|6431.37|-9.43| |tpch_sf10|-2.15|-7.66|-20.96|-27.98| |tpch_sf100|-3.14|-5.11|10.50|169.43| |tpch_sf1000|7.89|-5.54|17.97|318.64| -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
