[ 
https://issues.apache.org/jira/browse/HUDI-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5584:
--------------------------------
    Description: 
when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only one 
table to be synchronized to hive without suffix _ro.

But sometimes the table may have been created in hive early.

like:
{code:java}
create table hive.test.HUDI_5584 (
  id int,
 ts int)
 using hudi
 tblproperties (
  type = 'mor',
  primaryKey = 'id',
  preCombineField = 'ts',
  hoodie.datasource.hive_sync.enable = 'true',
hoodie.datasource.hive_sync.table.strategy='ro'
) location '/tmp/HUDI_5584'  {code}
and show create table .
{code:java}
CREATE EXTERNAL TABLE `hudi_5584`(
  `_hoodie_commit_time` string,
  `_hoodie_commit_seqno` string,
  `_hoodie_record_key` string,
  `_hoodie_partition_path` string,
  `_hoodie_file_name` string,
  `id` int,
  `ts` int)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'path'='file:///tmp/HUDI_5584')
STORED AS INPUTFORMAT
  'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'file:/tmp/HUDI_5584'
TBLPROPERTIES (
  'hoodie.datasource.hive_sync.enable'='true',
  'hoodie.datasource.hive_sync.table.strategy'='ro',
  'preCombineField'='ts',
  'primaryKey'='id',
  'spark.sql.create.version'='3.3.1',
  'spark.sql.sources.provider'='hudi',
  'spark.sql.sources.schema.numParts'='1',
  'spark.sql.sources.schema.part.0'='xx'
  'transient_lastDdlTime'='1674108302',
  'type'='mor') {code}
the table like a realtime table.

When we finish writing data and synchronize tables, because the table already 
exists, so SERDEPROPERTIES and  OUTPUTFORMAT will not be modified.

This causes the type of the table to be unexpected.

 

 

> When the table to be synchronized already exists in hive, need to update 
> serde/table properties
> -----------------------------------------------------------------------------------------------
>
>                 Key: HUDI-5584
>                 URL: https://issues.apache.org/jira/browse/HUDI-5584
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: HunterXHunter
>            Priority: Major
>
> when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only 
> one table to be synchronized to hive without suffix _ro.
> But sometimes the table may have been created in hive early.
> like:
> {code:java}
> create table hive.test.HUDI_5584 (
>   id int,
>  ts int)
>  using hudi
>  tblproperties (
>   type = 'mor',
>   primaryKey = 'id',
>   preCombineField = 'ts',
>   hoodie.datasource.hive_sync.enable = 'true',
> hoodie.datasource.hive_sync.table.strategy='ro'
> ) location '/tmp/HUDI_5584'  {code}
> and show create table .
> {code:java}
> CREATE EXTERNAL TABLE `hudi_5584`(
>   `_hoodie_commit_time` string,
>   `_hoodie_commit_seqno` string,
>   `_hoodie_record_key` string,
>   `_hoodie_partition_path` string,
>   `_hoodie_file_name` string,
>   `id` int,
>   `ts` int)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> WITH SERDEPROPERTIES (
>   'path'='file:///tmp/HUDI_5584')
> STORED AS INPUTFORMAT
>   'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION
>   'file:/tmp/HUDI_5584'
> TBLPROPERTIES (
>   'hoodie.datasource.hive_sync.enable'='true',
>   'hoodie.datasource.hive_sync.table.strategy'='ro',
>   'preCombineField'='ts',
>   'primaryKey'='id',
>   'spark.sql.create.version'='3.3.1',
>   'spark.sql.sources.provider'='hudi',
>   'spark.sql.sources.schema.numParts'='1',
>   'spark.sql.sources.schema.part.0'='xx'
>   'transient_lastDdlTime'='1674108302',
>   'type'='mor') {code}
> the table like a realtime table.
> When we finish writing data and synchronize tables, because the table already 
> exists, so SERDEPROPERTIES and  OUTPUTFORMAT will not be modified.
> This causes the type of the table to be unexpected.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to