garyli1019 commented on a change in pull request #2378:
URL: https://github.com/apache/hudi/pull/2378#discussion_r555064333
##########
File path:
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadSnapshotRelation.scala
##########
@@ -108,7 +111,7 @@ class MergeOnReadSnapshotRelation(val sqlContext:
SQLContext,
dataSchema = tableStructSchema,
partitionSchema = StructType(Nil),
Review comment:
hi @yui2010 , how's your dataset looks like? Does it has a `dt` column
in the dataset? The partitioning I am referring to is that when you
`spark.read.format('hudi').load(basePath)` and your dataset folder structure
looks like `basePath/dt=20201010`, then Spark is able to append a `dt` column
to your dataset. When you do sth like `df.filter(dt=20201010)`, spark will go
to this partition and read the file. How's your workflow to load your data and
pass the partition information to Spark?
In order to get more information about this implementation, would you write
a test to demo the partition pruning?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]