hudi-agent commented on code in PR #18385:
URL: https://github.com/apache/hudi/pull/18385#discussion_r3191802421


##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/CloudObjectsSelectorCommon.java:
##########
@@ -546,6 +547,34 @@ private static Option<String> getPropVal(TypedProperties 
props, ConfigProperty<S
     return Option.empty();
   }
 
+  /**
+   * Enables Spark {@code mergeSchema} for cloud object batches of Parquet or 
ORC files when configured, so
+   * heterogeneous files in one sync round share a merged struct type. Applied 
before user
+   * {@link CloudSourceConfig#SPARK_DATASOURCE_OPTIONS} so explicit reader 
options can override.
+   *
+   * <p>Spark's native Parquet reader honors {@code mergeSchema} on all 
supported versions. Spark's native ORC
+   * reader honors it on Spark 3.0+ (the native ORC impl is the default since 
Spark 2.4); on older runtimes the
+   * option is silently ignored, which is harmless.
+   */
+  private DataFrameReader applyMergeSchemaOption(DataFrameReader reader, 
String fileFormat) {
+    if (!isParquetOrOrcFileFormat(fileFormat)) {
+      return reader;
+    }
+    if (!getBooleanWithAltKeys(properties, CLOUD_INCREMENTAL_MERGE_SCHEMA)) {
+      return reader;
+    }
+    return reader.option("mergeSchema", "true");
+  }
+
+  // Package-private for unit testing — see TestCloudObjectsSelectorCommon.
+  static boolean isParquetOrOrcFileFormat(String fileFormat) {
+    if (fileFormat == null) {
+      return false;
+    }

Review Comment:
   🤖 nit: could you rename `f` to something like `trimmed` or 
`normalizedFormat`? Single-letter locals make sense in tiny lambdas but here 
it's a named local in a package-private method that test code calls directly, 
so a slightly longer name would make the reader's intent clearer at a glance.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to