voonhous commented on code in PR #18938:
URL: https://github.com/apache/hudi/pull/18938#discussion_r3434681442
##########
hudi-hadoop-common/src/main/java/org/apache/hudi/avro/HoodieAvroWriteSupport.java:
##########
@@ -74,6 +274,136 @@ public void addFooterMetadata(String key, String value) {
footerMetadata.put(key, value);
}
+ /**
+ * Bundles the Avro sub-schema and {@link HoodieSchema.Variant} for a
shredded variant field,
+ * keyed by effective-schema field index in {@link #shreddedVariantFields}.
+ */
+ private static final class ShreddedVariantField {
+ private final Schema avroSchema;
+ private final HoodieSchema.Variant hoodieSchema;
+
+ ShreddedVariantField(Schema avroSchema, HoodieSchema.Variant hoodieSchema)
{
+ this.avroSchema = avroSchema;
+ this.hoodieSchema = hoodieSchema;
+ }
+ }
+
+ private static final Pattern DECIMAL_PATTERN = Pattern.compile(
+ "decimal\\s*\\(\\s*(\\d+)\\s*,\\s*(\\d+)\\s*\\)");
+
+ /**
+ * Applies a forced shredding schema to all variant fields in the given
schema.
+ * The forced schema DDL (e.g., {@code "a int, b string"}) defines the
typed_value
+ * fields that will be added to each variant column.
+ */
+ private static HoodieSchema applyForcedShreddingSchema(HoodieSchema schema,
String ddl) {
+ if (schema.getType() != HoodieSchemaType.RECORD) {
+ return schema;
+ }
+
+ Map<String, HoodieSchema> shreddedFields = parseShreddingDDL(ddl);
+
+ List<HoodieSchemaField> fields = schema.getFields();
+ List<HoodieSchemaField> newFields = new ArrayList<>();
+ boolean changed = false;
+
+ for (HoodieSchemaField field : fields) {
+ HoodieSchema fieldSchema = field.schema();
+ boolean wasNullable = fieldSchema.isNullable();
+ HoodieSchema unwrapped = wasNullable ? fieldSchema.getNonNullType() :
fieldSchema;
+
+ if (unwrapped.getType() == HoodieSchemaType.VARIANT) {
+ HoodieSchema.Variant shreddedVariant =
HoodieSchema.createVariantShreddedObject(
+ unwrapped.getAvroSchema().getName(),
+ unwrapped.getAvroSchema().getNamespace(),
+ unwrapped.getAvroSchema().getDoc(),
+ shreddedFields);
+ HoodieSchema replacement = wasNullable
+ ? HoodieSchema.createNullable(shreddedVariant) : shreddedVariant;
+
newFields.add(HoodieSchemaUtils.createNewSchemaField(field.makeNullable().withSchema(replacement)));
+ changed = true;
+ } else {
+ newFields.add(HoodieSchemaUtils.createNewSchemaField(field));
+ }
+ }
+
+ if (!changed) {
+ return schema;
+ }
+
+ return HoodieSchema.createRecord(
+ schema.getAvroSchema().getName(),
+ schema.getAvroSchema().getNamespace(),
+ schema.getAvroSchema().getDoc(),
+ newFields);
+ }
+
+ /**
+ * Parses a DDL-style shredding schema string (e.g., {@code "a int, b
string, c decimal(15,1)"})
+ * into a map of field names to their HoodieSchema types.
+ */
+ private static Map<String, HoodieSchema> parseShreddingDDL(String ddl) {
+ Map<String, HoodieSchema> fields = new LinkedHashMap<>();
+ for (String fieldDef : ddl.split(",")) {
+ String trimmed = fieldDef.trim();
+ if (trimmed.isEmpty()) {
+ continue;
+ }
+ String[] parts = trimmed.split("\\s+", 2);
+ if (parts.length != 2) {
+ throw new IllegalArgumentException(
+ "Invalid shredding DDL field definition (expected 'name type'): "
+ trimmed);
+ }
+ fields.put(parts[0].trim(), parseSimpleType(parts[1].trim()));
+ }
Review Comment:
Already addressed in #18065 (merged): `parseShreddingDDL` no longer uses
`ddl.split(",")`. It uses the shared `StringUtils.splitTopLevelCommas`, the
same paren-aware tokenizer the vector path uses in
`HoodieSchema.parseVectorColumnNames`, so `decimal(15, 1)` tokenizes correctly:
```java
// Split on top-level commas only so parameterized types such as decimal(15,
1) survive intact.
for (String fieldDef : StringUtils.splitTopLevelCommas(ddl)) {
```
This code is inherited from master and is no longer part of this PR's diff.
On reusing a vector utility for the decimal type itself:
`parseTypeDescriptor` only accepts custom logical types (`VECTOR` / `BLOB` /
`VARIANT`) and throws otherwise, so it can't parse `int` / `string` /
`decimal(p,s)` here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]