Re: [I] [Improvement] The execution time of spark connector is 4 times that of native spark3.3.2 when running tpcds sql99 [gravitino]

via GitHub Wed, 19 Nov 2025 00:55:55 -0800


jerqi commented on issue #7048:
URL: https://github.com/apache/gravitino/issues/7048#issuecomment-3551545223


   > > At present, our optimization has achieved a reading data rate of 
approximately 66% for ORC and 80%+ for Parquet compared to native Spark.
   > > We have made the following optimizations:
   > > 
   > > 1. ORC Pusheddown filter. [[FEATURE] Spark Hive connector supports ORC 
hive table pushdown filter 
kyuubi#7122](https://github.com/apache/kyuubi/issues/7122)
   > > 2. Parquet Pusheddown filter. [[FEATURE] Spark Hive connector supports 
Parquet hive table pushdown filter 
kyuubi#7129](https://github.com/apache/kyuubi/issues/7129)
   > > 3. ORC Dynamic Partition Pruning. [[SPARK-52969][SQL] Support DSv2 
OrcScan Dynamic Partition Pruning 
spark#52009](https://github.com/apache/spark/pull/52009)
   > > 4. Parquet Dynamic Partition Pruning. [[SPARK-53439][SQL] Support DSv2 
ParquetScan Dynamic Partition Pruning 
spark#52180](https://github.com/apache/spark/pull/52180)
   > > 5. Fix kyuubi spark connector FileStatusCache connot cached. [[KYUUBI 
#7192] Fix filestatus not cached 
kyuubi#7191](https://github.com/apache/kyuubi/pull/7191)
   > 
   > After 5 merged, [#9110](https://github.com/apache/gravitino/issues/9110) 
is better to merged.
   
   OK, I can help review this PR. But we need necessary unit tests, too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Improvement] The execution time of spark connector is 4 times that of native spark3.3.2 when running tpcds sql99 [gravitino]

Reply via email to