[jira] [Updated] (HUDI-2839) Align configs across Spark datasource, write client, etc

Ethan Guo (Jira) Tue, 23 Nov 2021 09:31:06 -0800


     [ 
https://issues.apache.org/jira/browse/HUDI-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ethan Guo updated HUDI-2839:
----------------------------
    Description: 
This is aroused when discussing HUDI-2818.  For the same logic such as 
keygenerator, compaction, clustering, etc., there are different configs in 
Spark datasource and write client and they may cause conflicts.  This can cause 
unexpected behavior on the write path.

 

Raymond: I encountered this NPE when trying to run 0.10 over a 0.8 table: 
https://issues.apache.org/jira/browse/HUDI-2818.

to align configs, do you think we should auto set 
{{hoodie.table.keygenerator.class}} when user sets 
{{hoodie.datasource.write.keygenerator.class}} and also the other way around?

Siva: guess in the regular write path(HoodiesparkSqlWriter), this is what 
happens. i.e. users sets only 
{{{}hoodie.datasource.write.keygenerator.class{}}}, but internally we set 
{{hoodie.table.keygenerator.class}}  from datasource write config.

Vinoth: {{HoodieConfig}} has some alternaitves/fallback mechanism. Something to 
consider

but overall we should fix these

Ethan: when working on compaction/clustering, I also see different configs 
around the same logic between spark datasource and write client.  maybe we can 
take a pass of all configs later and make them consistent

> Align configs across Spark datasource, write client, etc
> --------------------------------------------------------
>
>                 Key: HUDI-2839
>                 URL: https://issues.apache.org/jira/browse/HUDI-2839
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: configs
>            Reporter: Ethan Guo
>            Priority: Major
>             Fix For: 0.11.0
>
>
> This is aroused when discussing HUDI-2818.  For the same logic such as 
> keygenerator, compaction, clustering, etc., there are different configs in 
> Spark datasource and write client and they may cause conflicts.  This can 
> cause unexpected behavior on the write path.
>  
> Raymond: I encountered this NPE when trying to run 0.10 over a 0.8 table: 
> https://issues.apache.org/jira/browse/HUDI-2818.
> to align configs, do you think we should auto set 
> {{hoodie.table.keygenerator.class}} when user sets 
> {{hoodie.datasource.write.keygenerator.class}} and also the other way around?
> Siva: guess in the regular write path(HoodiesparkSqlWriter), this is what 
> happens. i.e. users sets only 
> {{{}hoodie.datasource.write.keygenerator.class{}}}, but internally we set 
> {{hoodie.table.keygenerator.class}}  from datasource write config.
> Vinoth: {{HoodieConfig}} has some alternaitves/fallback mechanism. Something 
> to consider
> but overall we should fix these
> Ethan: when working on compaction/clustering, I also see different configs 
> around the same logic between spark datasource and write client.  maybe we 
> can take a pass of all configs later and make them consistent



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2839) Align configs across Spark datasource, write client, etc

Reply via email to