[
https://issues.apache.org/jira/browse/HUDI-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17912291#comment-17912291
]
Davis Zhang commented on HUDI-8817:
-----------------------------------
Dug deeper, *this is not 1.0 specific issue.* The culprit is on the SQL create
table path we never validate write config against table config.
*Bug fix implementation plan:*
To enable the validation we need to do 2 things:
Collect all source of configurations, it includes:
* options set via "set myOpt=myVal" in spark sql sessions. They are stored in `
spark.sqlContext.conf` variable. It can contain arbitrary kv pairs and some can
be hoodie related. It might also have hoodie table configs, write client
configs, etc.
* options retrieved via the OPTION clause of the create table stmt. Same as
above, arbitrary kv pairs can be involved.
* hudi-defaults.conf file set various options. Same properties as the above 2
The 3 sets of configs can have conflicting ones - same key yet values are
different. Here is the proposal:
# We start with conf1 derived from hudi-defaults.conf, all non hudi configs
are excluded. Also all table configs are excluded.
# Then we merge conf1 with spark.sqlContext.conf. Again, all table configs are
excluded. We get conf 2
# Then we process conf 3 derived from the OPTION clause of the create table
DDL. It is splited into 2 conf - table_conf which only include table related
configurations and conf 4 containing the rest.
# Finally with conf 2. conf 4 and table_conf we do the following:
## apply conf 4 on top of conf 2, with conf 4 overriding all conflicting keys'
value. We get conf 5
## Do CommonClientUtils#validateTableVersion using conf 5 and table_conf.
Today we do not do proper pre-filtering and differentiation between table conf
and other conf. Some source of config are not even involved in the process. To
implement the above procedure it requires 2 days of effort (code+test). If we
want some quick fix just for the reported corner case, we can do in 4 hrs.
> Hudi 1.0.0 cannot create older version table
> --------------------------------------------
>
> Key: HUDI-8817
> URL: https://issues.apache.org/jira/browse/HUDI-8817
> Project: Apache Hudi
> Issue Type: Sub-task
> Affects Versions: 1.0.0
> Reporter: Shawn Chang
> Assignee: Davis Zhang
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 1.0.1
>
> Time Spent: 2.5h
> Remaining Estimate: 0h
>
> When using Hudi 1.0 + backward writer, it still creates version 8 table which
> cannot be read by an older version Hudi reader.
> Reproduction steps:
> 1. Create the table with SQL, specifying the table version to be 6 (I've
> tested the same with DF and Hudi 1.0 also cannot create version 6 table)
> {code:java}
> CREATE TABLE hudi_14_table_sql_005 (
> event_id INT,
> event_date STRING,
> event_name STRING,
> event_ts STRING,
> event_type STRING
> ) USING hudi
> OPTIONS(
> type = 'cow', -- or 'mor'
> primaryKey = 'event_id,event_date',
> preCombileField = 'event_ts',
> hoodie.write.table.version = 6
> )
> PARTITIONED BY (event_type)
> LOCATION 's3://<some_bucket>/hudi_14_table_sql_005';
> {code}
> 2. Check `hoodie.properties` under the table's S3 location, and you should
> see `hoodie.table.version=8`
--
This message was sent by Atlassian Jira
(v8.20.10#820010)