[GitHub] [hudi] nsivabalan commented on a diff in pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

GitBox Mon, 18 Apr 2022 15:50:11 -0700


nsivabalan commented on code in PR #5352:
URL: https://github.com/apache/hudi/pull/5352#discussion_r852466064



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -206,4 +208,32 @@ class DefaultSource extends RelationProvider
                             parameters: Map[String, String]): Source = {
     new HoodieStreamSource(sqlContext, metadataPath, schema, parameters)
   }
+
+  private def resolveBaseFileOnlyRelation(sqlContext: SQLContext,
+                                          globPaths: Seq[Path],
+                                          userSchema: Option[StructType],
+                                          metaClient: HoodieTableMetaClient,
+                                          optParams: Map[String, String]) = {
+    val baseRelation = new BaseFileOnlyRelation(sqlContext, metaClient, 
optParams, userSchema, globPaths)
+    val enableSchemaOnRead: Boolean = 
!tryFetchInternalSchema(metaClient).isEmptySchema
+
+    // NOTE: We fallback to [[HadoopFsRelation]] in all of the cases except 
ones requiring usage of
+    //       [[BaseFileOnlyRelation]] to function correctly. This is necessary 
to maintain performance parity w/
+    //       vanilla Spark, since some of the Spark optimizations are 
predicated on the using of [[HadoopFsRelation]].
+    //
+    //       You can check out HUDI-3896 for more details
+    if (enableSchemaOnRead) {
+      baseRelation
+    } else {
+      baseRelation.toHadoopFsRelation
+    }
+  }
+
+  private def tryFetchInternalSchema(metaClient: HoodieTableMetaClient) =

Review Comment:
   Is schema evolution flippable for a given table ? I mean, can someone enable 
for few commits and disable it and re-enable it back after sometime? if not, we 
might need to add it as tableConfig(enabling schema evolution). and if we 
already have one, we should rely on it rather than parsing the commit metadata 
everytime? we can take it as a follow up. just curious. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] nsivabalan commented on a diff in pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

Reply via email to