[
https://issues.apache.org/jira/browse/HUDI-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raymond Xu updated HUDI-4071:
-----------------------------
Sprint: 2022/05/16
> Better Spark Datasource default configs
> ---------------------------------------
>
> Key: HUDI-4071
> URL: https://issues.apache.org/jira/browse/HUDI-4071
> Project: Apache Hudi
> Issue Type: Task
> Reporter: Sagar Sumit
> Priority: Major
>
> Default configs should be:
> # Optimized for insert/bulk_insert e.g. by default if we have NONE sort mode
> then it's as good as parquet writes with some additional work for meta
> columns. An extension of this is to keep a map of minimal optimized configs
> per operation type. This is partly related to better performant configs
> HUDI-2151
> # Make reasonable assumptions, e.g. for index type, bloom filter does not
> rely on any external system, so it can be a better default candidate than
> let's say HBase index.
> # Scout all configs with noDefaultValue and assign a default if necessary.
> # Keep spark-sql and spark datasource config keys same as much as possible,
> otherwise it's difficult operationally for the user. Rename/reuse existing
> datasource keys that are meant for same purpose. This is related to HUDI-4070
> as well.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)