[
https://issues.apache.org/jira/browse/HIVE-23048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066714#comment-17066714
]
Peter Vary commented on HIVE-23048:
-----------------------------------
Just to summarize what I have found:
* NEXT_TXN_ID is used for multiple purposes
** Generate txnId - sequence is adequate substitution for this purpose
** Make sure that the openTxns, and commitTxn (for writes) commands are
serialized - we need extra lock for this. In a follow-up Jira we should check
if other places are depend on the serialization, or not. We might be able to
create only shared locks for {{openTxns}}, and exclusive locks for
{{commitTxn}} with writes. Needs further investigation.
** Also used for Compactor during the old data cleanup, and when deciding
which compaction can proceed. These tasks usually require a consistent view of
the related tables. We have to replace this part of the code with single query
solutions to ensure the consistent view.
** NEXT_TXN_ID was used as a cache for the HWM. We have to replace it with
extra queries
* Autogeneration of the keys is:
** Working for
*** Derby
*** Postgres
*** SQL Server (mssql)
** Questionable for
*** Oracle 11 or earlier (Oracle 12c has Identity) - We have to use sequences
to manually insert data
*** MySQL 5.x (MySQL 8.0 fixes this) -
{quote}"In *MySQL 5.7* and earlier, the auto-increment counter is *stored only
in main memory, not on disk*. To initialize an auto-increment counter after a
server restart, InnoDB would execute the equivalent of the following statement
on the first insert into a table containing an AUTO_INCREMENT column.
{quote}_SELECT MAX(ai_col) FROM table_name FOR UPDATE;
{quote}
In MySQL 8.0, this behavior is changed. The current maximum auto-increment
counter value is written to the redo log each time it changes and is saved to
an engine-private system table on each checkpoint. These changes make the
current maximum auto-increment counter value persistent across server restarts."
{quote} * Retrieving the value for the generated key with batch inserts:
** Working for
*** MySQL
*** Postgres
*** Oracle
** Not working for
*** SQL Server
*** Derby
Also the performance of inserts (table with 2 columns for testing) with
generated keys is really dependent of the database (1000 items, 100 loops,
local db instances, in ms):
{code:java}
Gen keys Gen keys Gen keys Gen keys No keys No
keys No keys Value
nobatch batched long prep long stat batched
long prep long stat retrieval
MySQL (5.7.23) 204 10 10 10 10
9 12 0.42
PostgreSQL (12.2) 68 12 5 4 4
3 4 0.57
Oracle (11.2.0.2.0) 2202 211 Error Error 122
438 425 2.37
MS SQL Server (14) 1988 Error Error Error 26
41 8 2.33
Apache Derby (10.14.1) 2043 Error Error Error 41
104 106 4.19
{code}
Current implementation uses long static query without retrieving keys ({{No
keys long stat}} column)
If we move forward with the autogenerated keys then we have to use the
generated keys feature, or have to query the inserted data to be able to return
the new identifiers again.
> Use sequences for TXN_ID generation
> -----------------------------------
>
> Key: HIVE-23048
> URL: https://issues.apache.org/jira/browse/HIVE-23048
> Project: Hive
> Issue Type: Bug
> Reporter: Peter Vary
> Assignee: Peter Vary
> Priority: Major
> Attachments: DbTest.java
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)