the-other-tim-brown commented on code in PR #17969:
URL: https://github.com/apache/hudi/pull/17969#discussion_r2716883578


##########
hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroReaderContext.java:
##########


Review Comment:
   Let's extract this up to line 216 so we call it once?



##########
hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroReaderContext.java:
##########
@@ -210,7 +210,12 @@ public ClosableIterator<IndexedRecord> 
getFileRecordIterator(
       HoodieSchema requiredSchema) throws IOException {
     HoodieSchema fileOutputSchema;
     Map<String, String> renamedColumns;
-    if (isLogFile) {
+    // Even when dataSchema equals requiredSchema, renamed columns still 
require schema rewriting
+    // to map old column names to new names during record reading. We only 
skip the lookup when
+    // both schemas match AND there are no renamed columns.
+    boolean hasNoRenamedColumns = 
getSchemaHandler().getRequiredSchemaForFileAndRenamedColumns(filePath).getRight().isEmpty();
+    if (isLogFile || (dataSchema.equals(requiredSchema) && 
hasNoRenamedColumns)) {

Review Comment:
   Can you update the comment to also cover why log files fall into this case



##########
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/functional/TestBootstrap.java:
##########
@@ -371,6 +398,124 @@ public void testMetadataAndFullBootstrapWithUpdatesMOR() 
throws Exception {
     testBootstrapCommon(true, true, EffectiveMode.MIXED_BOOTSTRAP_MODE);
   }
 
+  /**
+   * End-to-end test for metadata-only (non-full) bootstrap with non-nullable 
fields in the (source) parquet table:
+   * 1. Write a parquet table
+   * 2. Perform metadata-only bootstrap
+   * 3. Read the bootstrapped table
+   * 4. Perform an update and verify
+   */
+  @Test
+  public void testMetadataOnlyBootstrapEndToEndWithNonNullableFields() throws 
Exception {

Review Comment:
   Can some of the setup code in `testBootstrapCommon` be shared with this test?



##########
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/functional/TestBootstrap.java:
##########
@@ -167,10 +171,21 @@ private void reloadInputFormats() {
 
   public HoodieSchema generateNewDataSetAndReturnSchema(long timestamp, int 
numRecords, List<String> partitionPaths,
       String srcPath) throws Exception {
+    return generateNewDataSetAndReturnSchema(timestamp, numRecords, 
partitionPaths, srcPath, true);
+  }
+
+  public HoodieSchema generateNewDataSetAndReturnSchema(long timestamp, int 
numRecords, List<String> partitionPaths,
+      String srcPath, boolean isMakeFieldsNullable) throws Exception {
     boolean isPartitioned = partitionPaths != null && 
!partitionPaths.isEmpty();
     Dataset<Row> df =
         generateTestRawTripDataset(timestamp, 0, numRecords, partitionPaths, 
jsc, sqlContext);
-    df.printSchema();
+
+    if (!isMakeFieldsNullable) {
+      // Convert schema to non-nullable for all fields
+      df = applyNonNullableSchema(df);
+      df.printSchema();

Review Comment:
   Can we remove the printSchema? Or move it to debug/info logs?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to