[
https://issues.apache.org/jira/browse/HIVE-23048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062614#comment-17062614
]
Peter Vary commented on HIVE-23048:
-----------------------------------
We need the following functionality from the backend database:
* Way to generate TXN_ID - sequence, or identity like stuff
* Way to batch insert row to TXN data, and retrieve back the generated id-s
* Way to get the last transaction id for HWM calculations
Created a small test class to check the different databases, used the following
versions:
* Derby - 10.14.1.0
* MySQL - 5.7.23
* PostgreSQL - 11.5
* Microsoft SQL Server - 2017 GA
* Oracle - XE 11g - had to use ojdbc8.jar driver (downloaded for the 19c
version) to make getGeneratedKeys work for batch
Here are the results:
{code:java}
--------
Checking Apache Derby
Single statement
Gen key: 1
Multi statement
Gen keys: [11]
--------
Checking MySQL
Single statement
Gen key: 1
Multi statement
Gen keys: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
--------
Checking PostgreSQL
Single statement
Gen key: 1
Multi statement
Gen keys: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
Seq state: 11
--------
Checking Microsoft SQL Server
Single statement
Gen key: 1
Multi statement
com.microsoft.sqlserver.jdbc.SQLServerException: The statement must be executed
before any results can be obtained.
at
com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:234)
at
com.microsoft.sqlserver.jdbc.SQLServerStatement.getGeneratedKeys(SQLServerStatement.java:2216)
at
com.zaxxer.hikari.pool.HikariProxyPreparedStatement.getGeneratedKeys(HikariProxyPreparedStatement.java)
at org.apache.hadoop.hive.a.runCheck(a.java:130)
at org.apache.hadoop.hive.a.main(a.java:171)
Seq state: 11
--------
Checking Oracle
Single statement
Gen key: 1
Multi statement
Gen keys: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
Seq state: 11
{code}
The key takeaways:
* MSSQL and Derby does not support batch insert with getGeneratedKeys.
Workaround could be:
** Insert them row by row
* MySQL, MSSQL, Derby does not support retrieving the last transaction id.
Workaround could be:
** Run specific query, like:
{code:java}
SELECT MAX(TXN_ID) FROM (SELECT MAX(TXN_ID) FROM TXNS UNION ALL SELECT
MAX(CTC_TXNID) FROM COMPLETED_TXN_COMPONENTS)
{code}
> Use sequences for TXN_ID generation
> -----------------------------------
>
> Key: HIVE-23048
> URL: https://issues.apache.org/jira/browse/HIVE-23048
> Project: Hive
> Issue Type: Bug
> Reporter: Peter Vary
> Assignee: Peter Vary
> Priority: Major
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)