[jira] [Updated] (HUDI-4071) Better Spark Datasource default configs

Raymond Xu (Jira) Mon, 16 May 2022 04:13:05 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Raymond Xu updated HUDI-4071:
-----------------------------
    Sprint: 2022/05/16

> Better Spark Datasource default configs
> ---------------------------------------
>
>                 Key: HUDI-4071
>                 URL: https://issues.apache.org/jira/browse/HUDI-4071
>             Project: Apache Hudi
>          Issue Type: Task
>            Reporter: Sagar Sumit
>            Priority: Major
>
> Default configs should be:
>  # Optimized for insert/bulk_insert e.g. by default if we have NONE sort mode 
> then it's as good as parquet writes with some additional work for meta 
> columns. An extension of this is to keep a map of minimal optimized configs 
> per operation type. This is partly related to better performant configs 
> HUDI-2151
>  # Make reasonable assumptions, e.g. for index type, bloom filter does not 
> rely on any external system, so it can be a better default candidate than 
> let's say HBase index.
>  # Scout all configs with noDefaultValue and assign a default if necessary.
>  # Keep spark-sql and spark datasource config keys same as much as possible, 
> otherwise it's difficult operationally for the user. Rename/reuse existing 
> datasource keys that are meant for same purpose. This is related to HUDI-4070 
> as well.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HUDI-4071) Better Spark Datasource default configs

Reply via email to