voonhous commented on code in PR #18065:
URL: https://github.com/apache/hudi/pull/18065#discussion_r3354349887
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/row/HoodieRowParquetWriteSupport.java:
##########
@@ -129,6 +142,16 @@ public HoodieRowParquetWriteSupport(Configuration conf,
StructType structType, O
hadoopConf.set("spark.sql.parquet.writeLegacyFormat",
writeLegacyFormatEnabled);
hadoopConf.set("spark.sql.parquet.outputTimestampType",
config.getStringOrDefault(HoodieStorageConfig.PARQUET_OUTPUT_TIMESTAMP_TYPE));
hadoopConf.set("spark.sql.parquet.fieldId.write.enabled",
config.getStringOrDefault(PARQUET_FIELD_ID_WRITE_ENABLED));
+
+ // Variant shredding configs
+ this.variantWriteShreddingEnabled =
config.getBooleanOrDefault(PARQUET_VARIANT_WRITE_SHREDDING_ENABLED);
+ this.variantForceShreddingSchemaForTest =
config.getString(PARQUET_VARIANT_FORCE_SHREDDING_SCHEMA_FOR_TEST);
Review Comment:
This config mirrors Spark's own test-only
`spark.sql.variant.forceShreddingSchemaForTest`. It is currently the only way
to force an unshredded input into a shredded layout to exercise the write path
in tests; in normal writes, shredding only applies when the input schema
already declares `typed_value`, so there is no schema-inference path to drive
it yet. It is marked `markAdvanced()` and the key ends in `.for.test`.
I have removed the unused
`HoodieStorageConfig.Builder.parquetVariantForceShreddingSchemaForTest` method
so it is no longer part of the first-class builder API, and clarified the
documentation that it is test-only / not for production. The `ConfigProperty`
itself has to remain since it is read by key in the write path and set via SQL
in the tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]