[GitHub] [hudi] umehrot2 edited a comment on pull request #2651: [HUDI-1591] [RFC-26] Improve Hoodie Table Query Performance And Ease Of Use Fo…

GitBox Fri, 26 Mar 2021 17:45:14 -0700


umehrot2 edited a comment on pull request #2651:
URL: https://github.com/apache/hudi/pull/2651#issuecomment-808606053



   @pengzhiwei2018 I was testing Hudi without this patch via Spark SQL and I am 
a little confused. With Spark SQL I see partition pruning already works 
seamlessly for Hudi. Just start spark sql with:
   ```
   spark-sql --conf 
"spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf 
"spark.hadoop.mapreduce.input.pathFilter.class=org.apache.hudi.hadoop.HoodieROTablePathFilter"
 --jars /usr/lib/hudi/hudi-spark-bundle.jar
   ```
   Spark is able to get the partition schema from the catalog using 
`CatalogFileIndex` and do the partition pruning. So this partition pruning 
support we are adding, is this to be able to support partition pruning for 
datasource based queries ? I think for hive style partition tables pruning 
should have already worked via Spark datasource too, because Spark tries to 
identify partition columns from the path, but not sure why it does not work. I 
want to understand clearly what we are gaining when this PR gets merged, is it 
partition pruning for Spark datasource queries ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] umehrot2 edited a comment on pull request #2651: [HUDI-1591] [RFC-26] Improve Hoodie Table Query Performance And Ease Of Use Fo…

Reply via email to