[ 
https://issues.apache.org/jira/browse/HUDI-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

voon updated HUDI-7564:
-----------------------
    Description: 
*hoodie.datasource.hive_sync.support_timestamp* is required to be *false* such 
that *TIMESTAMP (MICROS)* columns will be synced onto HMS as *LONG* types.
 
While this is not visible to hive-console/spark-sql console with the 
{_}show-create-database{_}/{_}describe-table{_} command, HMS will store the 
timestamp type as:
 
{code:java}
support_timestamp=false LONG 
support_timestamp=true  TIMESTAMP{code}
 
By overriding this to {*}true{*}, Trino/Presto queries will fail with this 
error as it is reliant on HMS information:
{code:java}
Caused by: io.prestosql.jdbc.$internal.client.FailureInfo$FailureException: 
Expected field to be long, actual timestamp(9) (field 0)
at 
io.trino.plugin.hive.GenericHiveRecordCursor.validateType(GenericHiveRecordCursor.java:569)
at 
io.trino.plugin.hive.GenericHiveRecordCursor.getLong(GenericHiveRecordCursor.java:274)
at 
io.trino.spi.connector.RecordPageSource.getNextPage(RecordPageSource.java:106)
at io.trino.plugin.hudi.HudiPageSource.getNextPage(HudiPageSource.java:120)
at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:299)
at io.trino.operator.Driver.processInternal(Driver.java:395)
at io.trino.operator.Driver.lambda$process$8(Driver.java:298)
at io.trino.operator.Driver.tryWithLock(Driver.java:694)
at io.trino.operator.Driver.process(Driver.java:290)
at io.trino.operator.Driver.processForDuration(Driver.java:261)
at 
io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:911)
at 
io.trino.execution.executor.timesharing.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:188)
at 
io.trino.execution.executor.timesharing.TimeSharingTaskExecutor$TaskRunner.run(TimeSharingTaskExecutor.java:569)
at 
io.trino.$gen.Trino_trino426_sql_hudi_di07_001____20240326_074936_2.run(Unknown 
Source)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
2024-04-02 17:32:21 (UTC+8) INFO - Clear session property for connection.
2024-04-02 17:32:21 (UTC+8) ERROR- Task Execution failed with CommonException: 
Query failed (#20240402_093220_06724_cg4jg): Expected field to be long, actual 
timestamp(9) (field 0) {code}
To demonstrate that the default support_timestamp config is not true via 
spark-sql:
{code:java}
-- EXECUTE THESE QUERIES IN SPARK

-- Create a table 
create table if not exists dev_hudi.timestamp_issue (
  int_col   bigint,
  `timestamp_col` TIMESTAMP
) using hudi 
tblproperties (
  type = 'mor',
  primaryKey = 'int_col'
 );

-- Perform an insert to trigger hive sync to create _ro and _rt tables 
insert into dev_hudi.timestamp_issue select
          1 as int_col,
          to_timestamp('2023-01-01', 'yyyy-MM-dd') as timestamp_col;

-- Execute a query to verify that data has been written
select * from dev_hudi.timestamp_issue_rt;

-- Set support_timestamp to it's supposed default value (false)
set hoodie.datasource.hive_sync.support_timestamp=false;


-- Perform an insert again (Will throw an error)
insert into dev_hudi.timestamp_issue select
          1 as int_col,
          to_timestamp('2023-01-01', 'yyyy-MM-dd') as timestamp_col;{code}
The last insert query will throw the error below, showing that 
{*}support_timestamp{*}'s default value is {*}true{*}. 
{code:java}
Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception 
when hive syncing timestamp_issue
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:190)
    at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:58)
    ... 64 more
Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Could not convert 
field Type from TIMESTAMP to bigint for field timestamp_col
    at 
org.apache.hudi.hive.util.HiveSchemaUtil.getSchemaDifference(HiveSchemaUtil.java:118)
    at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:402)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:313)
    at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:231)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:187)
    ... 65 more {code}

  was:
*hoodie.datasource.hive_sync.support_timestamp* is required to be *false* such 
that *TIMESTAMP (MICROS)* columns will be synced onto HMS as *LONG* types.
 
While this is not visible to hive-console/spark-sql console with the 
{_}show-create-database{_}/{_}describe-table{_} command, HMS will store the 
timestamp type as:
 
{code:java}
support_timestamp=false LONG 
support_timestamp=true  TIMESTAMP{code}
 
By overriding this to {*}true{*}, Trino/Presto queries will fail with this 
error as it is reliant on HMS information:
{code:java}
Caused by: io.prestosql.jdbc.$internal.client.FailureInfo$FailureException: 
Expected field to be long, actual timestamp(9) (field 0)
at 
io.trino.plugin.hive.GenericHiveRecordCursor.validateType(GenericHiveRecordCursor.java:569)
at 
io.trino.plugin.hive.GenericHiveRecordCursor.getLong(GenericHiveRecordCursor.java:274)
at 
io.trino.spi.connector.RecordPageSource.getNextPage(RecordPageSource.java:106)
at io.trino.plugin.hudi.HudiPageSource.getNextPage(HudiPageSource.java:120)
at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:299)
at io.trino.operator.Driver.processInternal(Driver.java:395)
at io.trino.operator.Driver.lambda$process$8(Driver.java:298)
at io.trino.operator.Driver.tryWithLock(Driver.java:694)
at io.trino.operator.Driver.process(Driver.java:290)
at io.trino.operator.Driver.processForDuration(Driver.java:261)
at 
io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:911)
at 
io.trino.execution.executor.timesharing.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:188)
at 
io.trino.execution.executor.timesharing.TimeSharingTaskExecutor$TaskRunner.run(TimeSharingTaskExecutor.java:569)
at 
io.trino.$gen.Trino_trino426_sql_hudi_di07_001____20240326_074936_2.run(Unknown 
Source)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
2024-04-02 17:32:21 (UTC+8) INFO - Clear session property for connection.
2024-04-02 17:32:21 (UTC+8) ERROR- Task Execution failed with CommonException: 
Query failed (#20240402_093220_06724_cg4jg): Expected field to be long, actual 
timestamp(9) (field 0) {code}
To demonstrate that the default support_timestamp config is not true:
{code:java}
-- Create a table
create table if not exists dev_hudi.timestamp_issue (
  int_col   bigint,
  `timestamp_col` TIMESTAMP
) using hudi 
tblproperties (
  type = 'mor',
  primaryKey = 'int_col'
 );

-- Perform an insert to trigger hive sync to create _ro and _rt tables 
insert into dev_hudi.timestamp_issue select
          1 as int_col,
          to_timestamp('2023-01-01', 'yyyy-MM-dd') as timestamp_col;

-- Execute a query to verify that data has been written
select * from dev_hudi.timestamp_issue_rt;

-- Set support_timestamp to it's supposed default value (false)
set hoodie.datasource.hive_sync.support_timestamp=false;


-- Perform an insert again (Will throw an error)
insert into dev_hudi.timestamp_issue select
          1 as int_col,
          to_timestamp('2023-01-01', 'yyyy-MM-dd') as timestamp_col;{code}
The last insert query will throw the error below, showing that 
{*}support_timestamp{*}'s default value is {*}true{*}. 
{code:java}
Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception 
when hive syncing timestamp_issue
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:190)
    at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:58)
    ... 64 more
Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Could not convert 
field Type from TIMESTAMP to bigint for field timestamp_col
    at 
org.apache.hudi.hive.util.HiveSchemaUtil.getSchemaDifference(HiveSchemaUtil.java:118)
    at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:402)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:313)
    at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:231)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:187)
    ... 65 more {code}


> Fix HiveSync configuration inconsistencies
> ------------------------------------------
>
>                 Key: HUDI-7564
>                 URL: https://issues.apache.org/jira/browse/HUDI-7564
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: voon
>            Assignee: voon
>            Priority: Major
>
> *hoodie.datasource.hive_sync.support_timestamp* is required to be *false* 
> such that *TIMESTAMP (MICROS)* columns will be synced onto HMS as *LONG* 
> types.
>  
> While this is not visible to hive-console/spark-sql console with the 
> {_}show-create-database{_}/{_}describe-table{_} command, HMS will store the 
> timestamp type as:
>  
> {code:java}
> support_timestamp=false LONG 
> support_timestamp=true  TIMESTAMP{code}
>  
> By overriding this to {*}true{*}, Trino/Presto queries will fail with this 
> error as it is reliant on HMS information:
> {code:java}
> Caused by: io.prestosql.jdbc.$internal.client.FailureInfo$FailureException: 
> Expected field to be long, actual timestamp(9) (field 0)
> at 
> io.trino.plugin.hive.GenericHiveRecordCursor.validateType(GenericHiveRecordCursor.java:569)
> at 
> io.trino.plugin.hive.GenericHiveRecordCursor.getLong(GenericHiveRecordCursor.java:274)
> at 
> io.trino.spi.connector.RecordPageSource.getNextPage(RecordPageSource.java:106)
> at io.trino.plugin.hudi.HudiPageSource.getNextPage(HudiPageSource.java:120)
> at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:299)
> at io.trino.operator.Driver.processInternal(Driver.java:395)
> at io.trino.operator.Driver.lambda$process$8(Driver.java:298)
> at io.trino.operator.Driver.tryWithLock(Driver.java:694)
> at io.trino.operator.Driver.process(Driver.java:290)
> at io.trino.operator.Driver.processForDuration(Driver.java:261)
> at 
> io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:911)
> at 
> io.trino.execution.executor.timesharing.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:188)
> at 
> io.trino.execution.executor.timesharing.TimeSharingTaskExecutor$TaskRunner.run(TimeSharingTaskExecutor.java:569)
> at 
> io.trino.$gen.Trino_trino426_sql_hudi_di07_001____20240326_074936_2.run(Unknown
>  Source)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at java.base/java.lang.Thread.run(Thread.java:833)
> 2024-04-02 17:32:21 (UTC+8) INFO - Clear session property for connection.
> 2024-04-02 17:32:21 (UTC+8) ERROR- Task Execution failed with 
> CommonException: Query failed (#20240402_093220_06724_cg4jg): Expected field 
> to be long, actual timestamp(9) (field 0) {code}
> To demonstrate that the default support_timestamp config is not true via 
> spark-sql:
> {code:java}
> -- EXECUTE THESE QUERIES IN SPARK
> -- Create a table 
> create table if not exists dev_hudi.timestamp_issue (
>   int_col   bigint,
>   `timestamp_col` TIMESTAMP
> ) using hudi 
> tblproperties (
>   type = 'mor',
>   primaryKey = 'int_col'
>  );
> -- Perform an insert to trigger hive sync to create _ro and _rt tables 
> insert into dev_hudi.timestamp_issue select
>           1 as int_col,
>           to_timestamp('2023-01-01', 'yyyy-MM-dd') as timestamp_col;
> -- Execute a query to verify that data has been written
> select * from dev_hudi.timestamp_issue_rt;
> -- Set support_timestamp to it's supposed default value (false)
> set hoodie.datasource.hive_sync.support_timestamp=false;
> -- Perform an insert again (Will throw an error)
> insert into dev_hudi.timestamp_issue select
>           1 as int_col,
>           to_timestamp('2023-01-01', 'yyyy-MM-dd') as timestamp_col;{code}
> The last insert query will throw the error below, showing that 
> {*}support_timestamp{*}'s default value is {*}true{*}. 
> {code:java}
> Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception 
> when hive syncing timestamp_issue
>     at 
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:190)
>     at 
> org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:58)
>     ... 64 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Could not convert 
> field Type from TIMESTAMP to bigint for field timestamp_col
>     at 
> org.apache.hudi.hive.util.HiveSchemaUtil.getSchemaDifference(HiveSchemaUtil.java:118)
>     at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:402)
>     at 
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:313)
>     at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:231)
>     at 
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:187)
>     ... 65 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to