[GitHub] [hudi] xushiyan commented on a change in pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results

GitBox Sun, 23 Jan 2022 15:04:43 -0800


xushiyan commented on a change in pull request #4649:
URL: https://github.com/apache/hudi/pull/4649#discussion_r790342949




##########
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadSnapshotRelation.scala
##########
@@ -64,8 +64,9 @@ class MergeOnReadSnapshotRelation(val sqlContext: SQLContext,
 
   private val conf = sqlContext.sparkContext.hadoopConfiguration
   private val jobConf = new JobConf(conf)
+  private val withOperationField: Boolean = new 
TableSchemaResolver(metaClient).hasOperationField
   // use schema from latest metadata, if not present, read schema from the 
data file
-  private val schemaUtil = new TableSchemaResolver(metaClient)
+  private val schemaUtil = new TableSchemaResolver(metaClient, 
withOperationField)

Review comment:
       ditto

##########
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##########
@@ -477,4 +478,13 @@ public static MessageType readSchemaFromLogFile(FileSystem 
fs, Path path) throws
     }
     return null;
   }
+
+  public boolean hasOperationField() {

Review comment:
       this class has an instance variable `withOperationField` too. can we 
consolidate this method and that variable?

##########
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala
##########
@@ -74,7 +74,8 @@ class MergeOnReadIncrementalRelation(val sqlContext: 
SQLContext,
     optParams.getOrElse(DataSourceReadOptions.END_INSTANTTIME.key, 
lastInstant.getTimestamp))
   log.debug(s"${commitsTimelineToReturn.getInstants.iterator().toList.map(f => 
f.toString).mkString(",")}")
   private val commitsToReturn = 
commitsTimelineToReturn.getInstants.iterator().toList
-  private val schemaUtil = new TableSchemaResolver(metaClient)
+  private val withOperationField: Boolean = new 
TableSchemaResolver(metaClient).hasOperationField
+  private val schemaUtil = new TableSchemaResolver(metaClient, 
withOperationField)

Review comment:
       +1 to consolidate these. `TableSchemaResolver` should be the source of 
truth for `OperationField`, instead of being given the info from outside. The 
constructor with operation field seems to be an anti-pattern.

##########
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala
##########
@@ -74,7 +74,8 @@ class MergeOnReadIncrementalRelation(val sqlContext: 
SQLContext,
     optParams.getOrElse(DataSourceReadOptions.END_INSTANTTIME.key, 
lastInstant.getTimestamp))
   log.debug(s"${commitsTimelineToReturn.getInstants.iterator().toList.map(f => 
f.toString).mkString(",")}")
   private val commitsToReturn = 
commitsTimelineToReturn.getInstants.iterator().toList
-  private val schemaUtil = new TableSchemaResolver(metaClient)
+  private val withOperationField: Boolean = new 
TableSchemaResolver(metaClient).hasOperationField
+  private val schemaUtil = new TableSchemaResolver(metaClient, 
withOperationField)

Review comment:
       Also use a name closer to the class name makes things clearer
   
   ```suggestion
     private val schemaResolver = new TableSchemaResolver(metaClient, 
withOperationField)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] xushiyan commented on a change in pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results

Reply via email to