[ 
https://issues.apache.org/jira/browse/HUDI-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17912291#comment-17912291
 ] 

Davis Zhang commented on HUDI-8817:
-----------------------------------

Dug deeper, *this is not 1.0 specific issue.* The culprit is on the SQL create 
table path we never validate write config against table config.

 

*Bug fix implementation plan:*

To enable the validation we need to do 2 things:

Collect all source of configurations, it includes:
 * options set via "set myOpt=myVal" in spark sql sessions. They are stored in `
spark.sqlContext.conf` variable. It can contain arbitrary kv pairs and some can 
be hoodie related. It might also have hoodie table configs, write client 
configs, etc.
 * options retrieved via the OPTION clause of the create table stmt. Same as 
above, arbitrary kv pairs can be involved.
 * hudi-defaults.conf file set various options. Same properties as the above 2

The 3 sets of configs can have conflicting ones - same key yet values are 
different. Here is the proposal:

 
 # We start with conf1 derived from hudi-defaults.conf, all non hudi configs 
are excluded. Also all table configs are excluded.
 # Then we merge conf1 with spark.sqlContext.conf. Again, all table configs are 
excluded. We get conf 2
 # Then we process conf 3 derived from the OPTION clause of the create table 
DDL. It is splited into 2 conf - table_conf which only include table related 
configurations and conf 4 containing the rest.
 # Finally with conf 2. conf 4 and table_conf we do the following:
 ## apply conf 4 on top of conf 2, with conf 4 overriding all conflicting keys' 
value. We get conf 5
 ## Do CommonClientUtils#validateTableVersion using conf 5 and table_conf.

 

Today we do not do proper pre-filtering and differentiation between table conf 
and other conf. Some source of config are not even involved in the process. To 
implement the above procedure it requires 2 days of effort (code+test). If we 
want some quick fix just for the reported corner case, we can do in 4 hrs.

 

> Hudi 1.0.0 cannot create older version table
> --------------------------------------------
>
>                 Key: HUDI-8817
>                 URL: https://issues.apache.org/jira/browse/HUDI-8817
>             Project: Apache Hudi
>          Issue Type: Sub-task
>    Affects Versions: 1.0.0
>            Reporter: Shawn Chang
>            Assignee: Davis Zhang
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.0.1
>
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> When using Hudi 1.0 + backward writer, it still creates version 8 table which 
> cannot be read by an older version Hudi reader.
> Reproduction steps:
> 1. Create the table with SQL, specifying the table version to be 6 (I've 
> tested the same with DF and Hudi 1.0 also cannot create version 6 table)
> {code:java}
> CREATE TABLE hudi_14_table_sql_005 (
>     event_id INT,
>     event_date STRING,
>     event_name STRING,
>     event_ts STRING,
>     event_type STRING
> ) USING hudi
>  OPTIONS(
>     type = 'cow', -- or 'mor'
>     primaryKey = 'event_id,event_date',
>     preCombileField = 'event_ts',
>     hoodie.write.table.version = 6
> )
> PARTITIONED BY (event_type)
> LOCATION 's3://<some_bucket>/hudi_14_table_sql_005';
> {code}
> 2. Check `hoodie.properties` under the table's S3 location, and you should 
> see `hoodie.table.version=8`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to