voonhous commented on code in PR #18062:
URL: https://github.com/apache/hudi/pull/18062#discussion_r2945390957
##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieStorageConfig.java:
##########
@@ -168,6 +168,36 @@ public class HoodieStorageConfig extends HoodieConfig {
.withDocumentation("Control whether to write bloom filter or not.
Default true. "
+ "We can set to false in non bloom index cases for CPU resource
saving.");
+ public static final ConfigProperty<Boolean>
PARQUET_VARIANT_WRITE_SHREDDING_ENABLED = ConfigProperty
+ .key("hoodie.parquet.variant.write.shredding.enabled")
+ .defaultValue(true)
+ .sinceVersion("1.1.0")
+ .withDocumentation("Controls whether variant columns are written in
shredded format. "
+ + "When enabled (default), variant columns with shredding
information in the schema will be written "
+ + "in shredded format with typed_value columns. When disabled,
variant columns are always written "
+ + "in unshredded format regardless of the schema. "
+ + "Equivalent to Spark's spark.sql.variant.writeShredding.enabled.");
+
+ public static final ConfigProperty<String>
PARQUET_VARIANT_FORCE_SHREDDING_SCHEMA_FOR_TEST = ConfigProperty
+ .key("hoodie.parquet.variant.force.shredding.schema.for.test")
Review Comment:
Good question.
Yes, a table can have multiple variant columns.
FWIU, this config is used mainly for testing to force shredding. Spark will
try to force shredding on the subfields that are defined if there are more than
1 variant column,.
As mentioned. This is for testing use only, as long as we are inline with
Spark's behaviour we are good.
A concrete example:
If a table has `{col_1, col_2, col_3, col_4}` and both `{col_3, col_4}` are
variant columns, the `hoodie.parquet.variant.force.shredding.schema.for.test`
will try to enforce the same shredding rules on both `{col_3, col_4}`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]