[
https://issues.apache.org/jira/browse/HIVE-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672392#comment-16672392
]
Eugene Koifman commented on HIVE-18772:
---------------------------------------
patch 4 is a rebase of 3
> Make Acid Cleaner use MIN_HISTORY_LEVEL
> ---------------------------------------
>
> Key: HIVE-18772
> URL: https://issues.apache.org/jira/browse/HIVE-18772
> Project: Hive
> Issue Type: Improvement
> Components: Transactions
> Affects Versions: 3.0.0
> Reporter: Eugene Koifman
> Assignee: Eugene Koifman
> Priority: Major
> Attachments: HIVE-18772.01.patch, HIVE-18772.02.patch,
> HIVE-18772.02.patch, HIVE-18772.03.patch, HIVE-18772.04.patch
>
>
> Instead of using Lock Manager state as it currently does.
> This will eliminate possible race conditions
> See this
> [comment|https://issues.apache.org/jira/browse/HIVE-18192?focusedCommentId=16338208&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16338208]
> Suppose A is the set of all ValidTxnList across all active readers. Each
> ValidTxnList has minOpenTxnId.
> MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all
> currently active readers
> This means that no active transaction in the system sees any txn with txnid <
> X as open.
> This means if construct ValidTxnIdList with HWM=X-1 and use that in
> getAcidState(), any files determined by this call as 'obsolete', will be seen
> as obsolete by any existing/future reader, i.e. can be physically deleted.
> This is also necessary for multi-statement transactions where relying on the
> state of Lock Manager is not sufficient. For example
> Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open.
> 13 commits (via it's parent txn) at t2 > t1. (17 is still running).
> Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on
> Table1/Part1 (17 is still running)
> Now delta_13 may be cleaned since it can be seen as obsolete and there may be
> no locks on it, i.e. no one is reading it.
> Now at t4 > t3 17 may (multi stmt txn) needs to read Table1/Part1. It cannot
> use base_14 is that may have absorbed delete events from delete_delta_14.
> Another Use Case
> There is delta_1_1 and delta_2_2 on disk both created by committed txns.
> T5 starts reading these. At the same time compactor creates delta_1_2.
> Now Cleaner sees delta_1_1 and delta_1_2 as obsolete and may remove them
> while the read is still in progress. This is because Compactor itself is not
> running in a txn and the files that
> it produces are visible immediately. If it ran in a txn, the new files would
> only be visible once
> this txn is visible to others (including the Cleaner).
> Using MIN_HISTORY_LEVEL solves this.
> See description of HIVE-18747 for more details on MIN_HISTORY_LEVEL
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)