[GitHub] [hudi] garyli1019 commented on a change in pull request #2378: [HUDI-1491] Support partition pruning for MOR snapshot query

GitBox Tue, 09 Mar 2021 06:58:21 -0800


garyli1019 commented on a change in pull request #2378:
URL: https://github.com/apache/hudi/pull/2378#discussion_r590443628




##########
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala
##########
@@ -504,6 +506,42 @@ class TestMORDataSource extends HoodieClientTestBase {
     hudiSnapshotDF2.show(1)
   }
 
+  @Test
+  def testPrunePartitions() {
+    // First Operation:
+    // Producing parquet files to three hive style partitions like 
/partition=20150316/.
+    // SNAPSHOT view on MOR table with parquet files only.
+    dataGen.setPartitionPaths(Array("20150316","20150317","20160315"));
+    val records1 = recordsToStrings(dataGen.generateInserts("001", 100)).toList
+    val inputDF1 = spark.read.json(spark.sparkContext.parallelize(records1, 2))
+    inputDF1.write.format("org.apache.hudi")

Review comment:
       Hi @yui2010 , thanks for promoting this detailed explanation. Will look 
into this in the next few days.
   Regarding Datasource V2, yes, the community is planning to rewrite the data 
source API in V2, for better Spark 3 support. If you are interested, 
contributing is always welcome!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] garyli1019 commented on a change in pull request #2378: [HUDI-1491] Support partition pruning for MOR snapshot query

Reply via email to