umehrot2 commented on pull request #2651: URL: https://github.com/apache/hudi/pull/2651#issuecomment-808606053
@pengzhiwei2018 I was testing Hudi without this patch via Spark SQL and I am a little confused. With Spark SQL I see partition pruning already works seamlessly for Hudi. Just start spark sql with: ``` spark-sql --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf "spark.hadoop.mapreduce.input.pathFilter.class=org.apache.hudi.hadoop.HoodieROTablePathFilter" --jars /usr/lib/hudi/hudi-spark-bundle.jar ``` Spark is able to get the partition schema from the catalog using `CatalogFileIndex` and do the partition pruning. So this partition pruning support we are adding, is this to be able to support partition pruning for datasource based queries ? I think for hive style partition tables pruning should have already worked via Spark datasource too, because Spark tries to identify partition columns from the path, but not sure why it does not work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
