nsivabalan commented on code in PR #12317:
URL: https://github.com/apache/hudi/pull/12317#discussion_r1855671160
##########
hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java:
##########
@@ -216,7 +220,11 @@ public static Option<Schema.Type>
findNestedFieldType(Schema schema, String fiel
for (String part : parts) {
Schema.Field foundField = resolveNullableSchema(schema).getField(part);
if (foundField == null) {
- throw new HoodieAvroSchemaException(fieldName + " not a field in " +
schema);
+ if (throwOnNotFound) {
Review Comment:
whey do we trigger lookup using findNestedFieldType in the first place if
ordering field is not set.
why can't we deduce it from table config that ordering field is not set
bypass calling findNestedFieldType.
in other words, why do we need throwOnNotFound argument.
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieCreateRecordUtils.scala:
##########
@@ -143,9 +143,12 @@ object HoodieCreateRecordUtils {
avroRecWithoutMeta
}
- val hoodieRecord = if (shouldCombine && !precombineEmpty) {
- val orderingVal = HoodieAvroUtils.getNestedFieldVal(avroRec,
precombine,
- false,
consistentLogicalTimestampEnabled).asInstanceOf[Comparable[_]]
+ //TODO [HUDI-8574] we can throw exception if field doesn't exist
+ // lazy so that we don't evaluate if we short circuit the boolean
expression in the if below
+ lazy val orderingVal = HoodieAvroUtils.getNestedFieldVal(avroRec,
precombine,
+ true,
consistentLogicalTimestampEnabled).asInstanceOf[Comparable[_]]
Review Comment:
do we really need lazy here. whats the issue w/ previous code here? may be I
am missing something.
##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReaderSchemaHandler.java:
##########
@@ -160,12 +160,16 @@ private Schema generateRequiredSchema() {
List<Schema.Field> addedFields = new ArrayList<>();
for (String field : getMandatoryFieldsForMerging(hoodieTableConfig,
properties, dataSchema, recordMerger)) {
Review Comment:
I checked the impl of getMandatoryFieldsForMerging. It does not return
precombine if its not set. So, can you help me understand why do we need below
changes ?
##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodiePositionBasedSchemaHandler.java:
##########
@@ -82,9 +82,11 @@ private static InternalSchema
addPositionalMergeCol(InternalSchema internalSchem
@Override
public Pair<List<Schema.Field>,List<Schema.Field>>
getBootstrapRequiredFields() {
Pair<List<Schema.Field>,List<Schema.Field>> dataAndMetaCols =
super.getBootstrapRequiredFields();
- if (readerContext.supportsParquetRowIndex()) {
- if (!dataAndMetaCols.getLeft().isEmpty() &&
!dataAndMetaCols.getRight().isEmpty()) {
+ if (readerContext.supportsParquetRowIndex() && (this.needsBootstrapMerge
|| this.needsMORMerge)) {
Review Comment:
why can't we make this simpler.
- poll table config and find if precombine field is set or not.
- if yes, do changes to existing code block here.
- if not set, remove precombine field from both dataAndMetaCols.getLeft()
and dataAndMetaCols.getRight()
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -755,7 +755,9 @@ class HoodieSparkSqlWriterInternal {
.setPayloadClassName(payloadClass)
.setRecordMergeMode(RecordMergeMode.getValue(hoodieConfig.getString(HoodieWriteConfig.RECORD_MERGE_MODE)))
.setRecordMergeStrategyId(recordMergerStrategy)
-
.setPreCombineField(hoodieConfig.getStringOrDefault(PRECOMBINE_FIELD, null))
+ // we can't fetch preCombine field from hoodieConfig object, since
it falls back to "ts" as default value,
Review Comment:
why can't we just remove default value for the precombine field only?
https://github.com/apache/hudi/blob/7b773fc2d66d32b6a4d13ca8adaf231dea186dcd/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java#L166
and then chase the test failures.
##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java:
##########
@@ -112,7 +112,9 @@ public
HoodieBaseFileGroupRecordBuffer(HoodieReaderContext<T> readerContext,
this.payloadClass = Option.empty();
}
this.orderingFieldName =
Option.ofNullable(ConfigUtils.getOrderingField(props)).orElseGet(() ->
hoodieTableMetaClient.getTableConfig().getPreCombineField());
- this.orderingFieldTypeOpt = recordMergeMode ==
RecordMergeMode.COMMIT_TIME_ORDERING ? Option.empty() :
AvroSchemaUtils.findNestedFieldType(readerSchema, this.orderingFieldName);
+
+ // Don't throw exception due to [HUDI-8574]
+ this.orderingFieldTypeOpt = recordMergeMode ==
RecordMergeMode.COMMIT_TIME_ORDERING ? Option.empty() :
AvroSchemaUtils.findNestedFieldType(readerSchema, this.orderingFieldName,
false);
Review Comment:
why can't we check for isEmptyOrNull on orderingFieldName and then avoid
calling findNestedFieldType.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]