[
https://issues.apache.org/jira/browse/HIVE-11317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697438#comment-14697438
]
Eugene Koifman commented on HIVE-11317:
---------------------------------------
patch 4 includes changes to tests such that they don't rely on timing and
better comments.
The reason for separate thread is modularity and testing. For example, if
timed out transaction reaper is not keeping up it won't interfere with
compaction scheduling and vs. It can also be configured separately and makes
testing easier. I think HousekeeprService is a nice abstraction for later when
we add alerting capability and perhaps an isAlive service for compaction
processes.
performTimeouts(): it's more efficient to read 2500 entries from TXNS than
sending 25 queries and we can easily cache the result since it's just a list of
longs. The rest of the logic runs each batch in a separate transaction to keep
lock duration shorter - hopefully reduce the number of retries due to
deadlocks.
> ACID: Improve transaction Abort logic due to timeout
> ----------------------------------------------------
>
> Key: HIVE-11317
> URL: https://issues.apache.org/jira/browse/HIVE-11317
> Project: Hive
> Issue Type: Bug
> Components: Metastore, Transactions
> Affects Versions: 1.0.0
> Reporter: Eugene Koifman
> Assignee: Eugene Koifman
> Labels: triage
> Attachments: HIVE-11317.2.patch, HIVE-11317.3.patch,
> HIVE-11317.4.patch, HIVE-11317.patch
>
>
> the logic to Abort transactions that have stopped heartbeating is in
> TxnHandler.timeOutTxns()
> This is only called when DbTxnManger.getValidTxns() is called.
> So if there is a lot of txns that need to be timed out and the there are not
> SQL clients talking to the system, there is nothing to abort dead
> transactions, and thus compaction can't clean them up so garbage accumulates
> in the system.
> Also, streaming api doesn't call DbTxnManager at all.
> Need to move this logic into Initiator (or some other metastore side thread).
> Also, make sure it is broken up into multiple small(er) transactions against
> metastore DB.
> Also more timeOutLocks() locks there as well.
> see about adding TXNS.COMMENT field which can be used for "Auto aborted due
> to timeout" for example.
> The symptom of this is that the system keeps showing more and more Open
> transactions that don't seem to ever go away (and have no locks associated
> with them)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)