vamsikarnika commented on code in PR #11817:
URL: https://github.com/apache/hudi/pull/11817#discussion_r1743549641
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/CloudObjectsSelectorCommon.java:
##########
@@ -316,6 +343,197 @@ private static Dataset<Row> coalesceOrRepartition(Dataset
dataset, int numPartit
return dataset;
}
+ private static boolean isCoalesceRequired(TypedProperties properties, Schema
sourceSchema) {
+ return getBooleanWithAltKeys(properties,
CloudSourceConfig.SPARK_DATASOURCE_READER_COALESCE_ALIAS_COLUMNS)
+ && Objects.nonNull(sourceSchema)
+ && hasFieldWithAliases(sourceSchema);
+ }
+
+ /**
+ * Recursively checks if an Avro schema or any of its nested fields contain
aliases.
+ *
+ * @param schema The Avro schema to check.
+ * @return True if the schema or any of its fields contain aliases, false
otherwise.
+ */
+ private static boolean hasFieldWithAliases(Schema schema) {
+ // If the schema is a record, check its fields recursively
+ if (isNestedRecord(schema)) {
+ for (Schema.Field field : getRecordFields(schema)) {
+ // Check if the field has aliases
+ if (!field.aliases().isEmpty()) {
+ return true;
+ }
+ // Recursively check the field's schema for aliases
+ if (hasFieldWithAliases(field.schema())) {
Review Comment:
This code isn't called on every row. This is called once before reading data
from source. I can do a performance profile and add the results here if
required.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]