[GitHub] [hudi] umehrot2 commented on a change in pull request #1702: Bootstrap datasource changes

GitBox Tue, 30 Jun 2020 02:53:59 -0700


umehrot2 commented on a change in pull request #1702:
URL: https://github.com/apache/hudi/pull/1702#discussion_r447560016




##########
File path: hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala
##########
@@ -71,13 +78,16 @@ class IncrementalRelation(val sqlContext: SQLContext,
     optParams.getOrElse(DataSourceReadOptions.END_INSTANTTIME_OPT_KEY, 
lastInstant.getTimestamp))
     .getInstants.iterator().toList
 
-  // use schema from latest metadata, if not present, read schema from the 
data file
-  private val latestSchema = {
-    val schemaUtil = new TableSchemaResolver(metaClient)
-    val tableSchema = 
HoodieAvroUtils.createHoodieWriteSchema(schemaUtil.getTableAvroSchemaWithoutMetadataFields);
-    AvroConversionUtils.convertAvroSchemaToStructType(tableSchema)
+  // use schema from a file produced in the latest instant
+  val latestSchema: StructType = {
+    log.info("Inferring schema..")
+    val schemaResolver = new TableSchemaResolver(metaClient)
+    val tableSchema = schemaResolver.getTableAvroSchemaWithoutMetadataFields
+    val dataSchema = 
AvroConversionUtils.convertAvroSchemaToStructType(tableSchema)
+    StructType(skeletonSchema.fields ++ dataSchema.fields)

Review comment:
       Not really. The reason for doing it this way is to intentionally only 
read the **user data schema** and then append the **metadata/skeleton schema** 
to it. This avoids us to have unnecessary checks here, because if we read the 
whole schema then there will be differences. For regular hudi files, the schema 
would have both **skeleton + user data schema** whereas for **bootstrapped 
files** the schema would only have **user data schema** read from the source 
file. So to keep things simple I did it this way.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] umehrot2 commented on a change in pull request #1702: Bootstrap datasource changes

Reply via email to