hudi-agent commented on code in PR #18385:
URL: https://github.com/apache/hudi/pull/18385#discussion_r3191119558


##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/CloudObjectsSelectorCommon.java:
##########
@@ -546,6 +547,34 @@ private static Option<String> getPropVal(TypedProperties 
props, ConfigProperty<S
     return Option.empty();
   }
 
+  /**
+   * Enables Spark {@code mergeSchema} for cloud object batches of Parquet or 
ORC files when configured, so
+   * heterogeneous files in one sync round share a merged struct type. Applied 
before user
+   * {@link CloudSourceConfig#SPARK_DATASOURCE_OPTIONS} so explicit reader 
options can override.
+   *
+   * <p>Spark's native Parquet reader honors {@code mergeSchema} on all 
supported versions. Spark's native ORC
+   * reader honors it on Spark 3.0+ (the native ORC impl is the default since 
Spark 2.4); on older runtimes the
+   * option is silently ignored, which is harmless.
+   */
+  private DataFrameReader applyMergeSchemaOption(DataFrameReader reader, 
String fileFormat) {
+    if (!isParquetOrOrcFileFormat(fileFormat)) {
+      return reader;
+    }
+    if (!getBooleanWithAltKeys(properties, CLOUD_INCREMENTAL_MERGE_SCHEMA)) {
+      return reader;
+    }
+    return reader.option("mergeSchema", "true");
+  }
+
+  // Package-private for unit testing — see TestCloudObjectsSelectorCommon.
+  static boolean isParquetOrOrcFileFormat(String fileFormat) {
+    if (fileFormat == null) {
+      return false;
+    }
+    String f = fileFormat.trim();

Review Comment:
   🤖 nit: the single-character name `f` doesn't communicate intent here — could 
you rename it to `trimmed` (or just inline `fileFormat.trim()` in the return 
expression) so it's immediately clear what the variable represents?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to