[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3193: [HUDI-2107] Support Read Log Only MOR Table For Spark

GitBox Tue, 06 Jul 2021 04:37:43 -0700


xiarixiaoyao commented on a change in pull request #3193:
URL: https://github.com/apache/hudi/pull/3193#discussion_r664363268




##########
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadSnapshotRelation.scala
##########
@@ -137,11 +136,14 @@ class MergeOnReadSnapshotRelation(val sqlContext: 
SQLContext,
   }
 
   def buildFileIndex(filters: Array[Filter]): List[HoodieMergeOnReadFileSplit] 
= {
-
-    val fileStatuses = if (globPaths.isDefined) {
+    // Get all partition paths
+    val partitionPaths = if (globPaths.isDefined) {
       // Load files from the global paths if it has defined to be compatible 
with the original mode
       val inMemoryFileIndex = 
HoodieSparkUtils.createInMemoryFileIndex(sqlContext.sparkSession, globPaths.get)
-      inMemoryFileIndex.allFiles()
+      val fsView = new HoodieTableFileSystemView(metaClient,
+        metaClient.getActiveTimeline.getCommitsTimeline
+          .filterCompletedInstants, inMemoryFileIndex.allFiles().toArray)
+      
fsView.getLatestBaseFiles.iterator().asScala.toList.map(_.getFileStatus.getPath.getParent)

Review comment:
       if we only have log files.  inMemoryFileIndex.allfiles will be empty, 
since spark will filter .log
   
fsView.getLatestBaseFiles.iterator().asScala.toList.map(_.getFileStatus.getPath.getParent)
  will return a empty partitionPaths.  then buildFileIndex will return 
HoodieMergeOnReadFileSplit, nothings will be read




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3193: [HUDI-2107] Support Read Log Only MOR Table For Spark

Reply via email to