[GitHub] [hudi] garyli1019 commented on a change in pull request #2378: [HUDI-1491] Support partition pruning for MOR snapshot query

GitBox Mon, 11 Jan 2021 05:58:44 -0800


garyli1019 commented on a change in pull request #2378:
URL: https://github.com/apache/hudi/pull/2378#discussion_r555064333




##########
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadSnapshotRelation.scala
##########
@@ -108,7 +111,7 @@ class MergeOnReadSnapshotRelation(val sqlContext: 
SQLContext,
       dataSchema = tableStructSchema,
       partitionSchema = StructType(Nil),

Review comment:
       hi @yui2010 , how's your dataset looks like? Does it has a `dt` column 
in the dataset? The partitioning I am referring to is that when you 
`spark.read.format('hudi').load(basePath)` and your dataset folder structure 
looks like `basePath/dt=20201010`, then Spark is able to append a `dt` column 
to your dataset. When you do sth like `df.filter(dt=20201010)`, spark will go 
to this partition and read the file. How's your workflow to load your data and 
pass the partition information to Spark?
   In order to get more information about this implementation, would you write 
a test to demo the partition pruning? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] garyli1019 commented on a change in pull request #2378: [HUDI-1491] Support partition pruning for MOR snapshot query

Reply via email to