[
https://issues.apache.org/jira/browse/HUDI-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
HunterXHunter updated HUDI-5584:
--------------------------------
Description:
when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only one
table to be synchronized to hive without suffix _ro.
But sometimes tables have been created in hive early,
like:
{code:java}
create table hive.test.HUDI_5584 (
id int,
ts int)
using hudi
tblproperties (
type = 'mor',
primaryKey = 'id',
preCombineField = 'ts',
hoodie.datasource.hive_sync.enable = 'true',
hoodie.datasource.hive_sync.table.strategy='ro'
) location '/tmp/HUDI_5584' {code}
and show create table .
{code:java}
CREATE EXTERNAL TABLE `hudi_5584`(
`_hoodie_commit_time` string,
`_hoodie_commit_seqno` string,
`_hoodie_record_key` string,
`_hoodie_partition_path` string,
`_hoodie_file_name` string,
`id` int,
`ts` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
'path'='file:///tmp/HUDI_5584')
STORED AS INPUTFORMAT
'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'file:/tmp/HUDI_5584'
TBLPROPERTIES (
'hoodie.datasource.hive_sync.enable'='true',
'hoodie.datasource.hive_sync.table.strategy'='ro',
'preCombineField'='ts',
'primaryKey'='id',
'spark.sql.create.version'='3.3.1',
'spark.sql.sources.provider'='hudi',
'spark.sql.sources.schema.numParts'='1',
'spark.sql.sources.schema.part.0'='xx'
'transient_lastDdlTime'='1674108302',
'type'='mor') {code}
*The table like a realtime table.*
When we finish writing data and synchronize ro table , because the table
already exists, so SERDEPROPERTIES and OUTPUTFORMAT will not be modified.
This causes the type of the table is not match as expect.
was:
when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only one
table to be synchronized to hive without suffix _ro.
But sometimes tables have been created in hive early,
like:
{code:java}
create table hive.test.HUDI_5584 (
id int,
ts int)
using hudi
tblproperties (
type = 'mor',
primaryKey = 'id',
preCombineField = 'ts',
hoodie.datasource.hive_sync.enable = 'true',
hoodie.datasource.hive_sync.table.strategy='ro'
) location '/tmp/HUDI_5584' {code}
and show create table .
{code:java}
CREATE EXTERNAL TABLE `hudi_5584`(
`_hoodie_commit_time` string,
`_hoodie_commit_seqno` string,
`_hoodie_record_key` string,
`_hoodie_partition_path` string,
`_hoodie_file_name` string,
`id` int,
`ts` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
'path'='file:///tmp/HUDI_5584')
STORED AS INPUTFORMAT
'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'file:/tmp/HUDI_5584'
TBLPROPERTIES (
'hoodie.datasource.hive_sync.enable'='true',
'hoodie.datasource.hive_sync.table.strategy'='ro',
'preCombineField'='ts',
'primaryKey'='id',
'spark.sql.create.version'='3.3.1',
'spark.sql.sources.provider'='hudi',
'spark.sql.sources.schema.numParts'='1',
'spark.sql.sources.schema.part.0'='xx'
'transient_lastDdlTime'='1674108302',
'type'='mor') {code}
*The table like a realtime table.*
When we finish writing data and synchronize ro table , because the table
already exists, so SERDEPROPERTIES and OUTPUTFORMAT will not be modified.
This causes the type of the table is not match expect.
> When the table to be synchronized already exists in hive, need to update
> serde/table properties
> -----------------------------------------------------------------------------------------------
>
> Key: HUDI-5584
> URL: https://issues.apache.org/jira/browse/HUDI-5584
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: HunterXHunter
> Priority: Major
>
> when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only
> one table to be synchronized to hive without suffix _ro.
> But sometimes tables have been created in hive early,
> like:
> {code:java}
> create table hive.test.HUDI_5584 (
> id int,
> ts int)
> using hudi
> tblproperties (
> type = 'mor',
> primaryKey = 'id',
> preCombineField = 'ts',
> hoodie.datasource.hive_sync.enable = 'true',
> hoodie.datasource.hive_sync.table.strategy='ro'
> ) location '/tmp/HUDI_5584' {code}
> and show create table .
> {code:java}
> CREATE EXTERNAL TABLE `hudi_5584`(
> `_hoodie_commit_time` string,
> `_hoodie_commit_seqno` string,
> `_hoodie_record_key` string,
> `_hoodie_partition_path` string,
> `_hoodie_file_name` string,
> `id` int,
> `ts` int)
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> WITH SERDEPROPERTIES (
> 'path'='file:///tmp/HUDI_5584')
> STORED AS INPUTFORMAT
> 'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION
> 'file:/tmp/HUDI_5584'
> TBLPROPERTIES (
> 'hoodie.datasource.hive_sync.enable'='true',
> 'hoodie.datasource.hive_sync.table.strategy'='ro',
> 'preCombineField'='ts',
> 'primaryKey'='id',
> 'spark.sql.create.version'='3.3.1',
> 'spark.sql.sources.provider'='hudi',
> 'spark.sql.sources.schema.numParts'='1',
> 'spark.sql.sources.schema.part.0'='xx'
> 'transient_lastDdlTime'='1674108302',
> 'type'='mor') {code}
> *The table like a realtime table.*
>
> When we finish writing data and synchronize ro table , because the table
> already exists, so SERDEPROPERTIES and OUTPUTFORMAT will not be modified.
> This causes the type of the table is not match as expect.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)