nsivabalan commented on code in PR #12317:
URL: https://github.com/apache/hudi/pull/12317#discussion_r1855671160


##########
hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java:
##########
@@ -216,7 +220,11 @@ public static Option<Schema.Type> 
findNestedFieldType(Schema schema, String fiel
     for (String part : parts) {
       Schema.Field foundField = resolveNullableSchema(schema).getField(part);
       if (foundField == null) {
-        throw new HoodieAvroSchemaException(fieldName + " not a field in " + 
schema);
+        if (throwOnNotFound) {

Review Comment:
   whey do we trigger lookup using findNestedFieldType in the first place if 
ordering field is not set. 
   why can't we deduce it from table config that ordering field is not set 
bypass calling findNestedFieldType. 
   
   in other words, why do we need throwOnNotFound argument. 
   



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieCreateRecordUtils.scala:
##########
@@ -143,9 +143,12 @@ object HoodieCreateRecordUtils {
               avroRecWithoutMeta
             }
 
-            val hoodieRecord = if (shouldCombine && !precombineEmpty) {
-              val orderingVal = HoodieAvroUtils.getNestedFieldVal(avroRec, 
precombine,
-                false, 
consistentLogicalTimestampEnabled).asInstanceOf[Comparable[_]]
+            //TODO [HUDI-8574] we can throw exception if field doesn't exist
+            // lazy so that we don't evaluate if we short circuit the boolean 
expression in the if below
+            lazy val orderingVal = HoodieAvroUtils.getNestedFieldVal(avroRec, 
precombine,
+              true, 
consistentLogicalTimestampEnabled).asInstanceOf[Comparable[_]]

Review Comment:
   do we really need lazy here. whats the issue w/ previous code here? may be I 
am missing something. 



##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReaderSchemaHandler.java:
##########
@@ -160,12 +160,16 @@ private Schema generateRequiredSchema() {
     List<Schema.Field> addedFields = new ArrayList<>();
     for (String field : getMandatoryFieldsForMerging(hoodieTableConfig, 
properties, dataSchema, recordMerger)) {

Review Comment:
   I checked the impl of getMandatoryFieldsForMerging. It does not return 
precombine if its not set. So, can you help me understand why do we need below 
changes ? 



##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodiePositionBasedSchemaHandler.java:
##########
@@ -82,9 +82,11 @@ private static InternalSchema 
addPositionalMergeCol(InternalSchema internalSchem
   @Override
   public Pair<List<Schema.Field>,List<Schema.Field>> 
getBootstrapRequiredFields() {
     Pair<List<Schema.Field>,List<Schema.Field>> dataAndMetaCols = 
super.getBootstrapRequiredFields();
-    if (readerContext.supportsParquetRowIndex()) {
-      if (!dataAndMetaCols.getLeft().isEmpty() && 
!dataAndMetaCols.getRight().isEmpty()) {
+    if (readerContext.supportsParquetRowIndex() && (this.needsBootstrapMerge 
|| this.needsMORMerge)) {

Review Comment:
   why can't we make this simpler. 
   - poll table config and find if precombine field is set or not. 
   - if yes, do changes to existing code block here. 
   - if not set, remove precombine field from both dataAndMetaCols.getLeft() 
and dataAndMetaCols.getRight() 



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -755,7 +755,9 @@ class HoodieSparkSqlWriterInternal {
           .setPayloadClassName(payloadClass)
           
.setRecordMergeMode(RecordMergeMode.getValue(hoodieConfig.getString(HoodieWriteConfig.RECORD_MERGE_MODE)))
           .setRecordMergeStrategyId(recordMergerStrategy)
-          
.setPreCombineField(hoodieConfig.getStringOrDefault(PRECOMBINE_FIELD, null))
+          // we can't fetch preCombine field from hoodieConfig object, since 
it falls back to "ts" as default value,

Review Comment:
   why can't we just remove default value for the precombine field only? 
   
   
https://github.com/apache/hudi/blob/7b773fc2d66d32b6a4d13ca8adaf231dea186dcd/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java#L166
 
   
   and then chase the test failures. 



##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java:
##########
@@ -112,7 +112,9 @@ public 
HoodieBaseFileGroupRecordBuffer(HoodieReaderContext<T> readerContext,
       this.payloadClass = Option.empty();
     }
     this.orderingFieldName = 
Option.ofNullable(ConfigUtils.getOrderingField(props)).orElseGet(() -> 
hoodieTableMetaClient.getTableConfig().getPreCombineField());
-    this.orderingFieldTypeOpt = recordMergeMode == 
RecordMergeMode.COMMIT_TIME_ORDERING ? Option.empty() : 
AvroSchemaUtils.findNestedFieldType(readerSchema, this.orderingFieldName);
+
+    // Don't throw exception due to [HUDI-8574]
+    this.orderingFieldTypeOpt = recordMergeMode == 
RecordMergeMode.COMMIT_TIME_ORDERING ? Option.empty() : 
AvroSchemaUtils.findNestedFieldType(readerSchema, this.orderingFieldName, 
false);

Review Comment:
   why can't we check for isEmptyOrNull on orderingFieldName and then avoid 
calling findNestedFieldType.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to