[GitHub] [hudi] umehrot2 commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

GitBox Thu, 06 Aug 2020 03:13:05 -0700


umehrot2 commented on a change in pull request #1702:
URL: https://github.com/apache/hudi/pull/1702#discussion_r466310024




##########
File path: hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala
##########
@@ -92,36 +102,69 @@ class IncrementalRelation(val sqlContext: SQLContext,
   override def schema: StructType = latestSchema
 
   override def buildScan(): RDD[Row] = {
-    val fileIdToFullPath = mutable.HashMap[String, String]()
+    val regularFileIdToFullPath = mutable.HashMap[String, String]()
+    var metaBootstrapFileIdToFullPath = mutable.HashMap[String, String]()
+
     for (commit <- commitsToReturn) {
       val metadata: HoodieCommitMetadata = 
HoodieCommitMetadata.fromBytes(commitTimeline.getInstantDetails(commit)
         .get, classOf[HoodieCommitMetadata])
-      fileIdToFullPath ++= metadata.getFileIdAndFullPaths(basePath).toMap
+
+      if (HoodieTimeline.METADATA_BOOTSTRAP_INSTANT_TS == commit.getTimestamp) 
{

Review comment:
       I had discussed this with Balaji, and we concluded to invest in this so 
that there is no difference in terms of user experience w.r.t whether the table 
is bootstrapped or not. Right now users can query data in the first commit 
written through incremental query, and we wanted to support this experience 
with bootstrapped tables as well.
   
   A use-case I can think of:
   
   User needs to access data until a provided `end commit` (from `first commit` 
till some provided `end commit`). This is possible to do with incremental query 
in regular Hudi tables. We would want to support this experience even with 
bootstrapped tables.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] umehrot2 commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

Reply via email to