[
https://issues.apache.org/jira/browse/HIVE-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eugene Koifman updated HIVE-18772:
----------------------------------
Description:
Instead of using Lock Manager state as it currently does.
This will eliminate possible race conditions
See this
[comment|https://issues.apache.org/jira/browse/HIVE-18192?focusedCommentId=16338208&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16338208]
Suppose A is the set of all ValidTxnList across all active readers. Each
ValidTxnList has minOpenTxnId.
MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all
currently active readers
This means that no active transaction in the system sees any txn with txnid < X
as open.
This means if construct ValidTxnIdList with HWM=X-1 and use that in
getAcidState(), any files determined by this call as 'obsolete', will be seen
as obsolete by any existing/future reader, i.e. can be physically deleted.
This is also necessary for multi-statement transactions where relying on the
state of Lock Manager is not sufficient. For example
Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open.
13 commits (via it's parent txn) at t2 > t1. (17 is still running).
Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on
Table1/Part1 (17 is still running)
Now delta_13 may be cleaned since it can be seen as obsolete and there may be
no locks on it, i.e. no one is reading it.
Now at t4 > t3 17 may (multi stmt txn) needs to read Table1/Part1. It cannot
use base_14 is that may have absorbed delete events from delete_delta_14.
Another Use Case
There is delta_1_1 and delta_2_2 on disk both created by committed txns.
T5 starts reading these. At the same time compactor creates delta_1_2.
Now Cleaner sees delta_1_1 and delta_1_2 as obsolete and may remove them while
the read is still in progress. This is because Compactor itself is not running
in a txn and the files that
it produces are visible immediately. If it ran in a txn, the new files would
only be visible once
this txn is visible to others (including the Cleaner).
Using MIN_HISTORY_LEVEL solves this.
See description of HIVE-18747 for more details on MIN_HISTORY_LEVEL
was:
Instead of using Lock Manager state as it currently does.
This will eliminate possible race conditions
See this
[comment|https://issues.apache.org/jira/browse/HIVE-18192?focusedCommentId=16338208&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16338208]
Suppose A is the set of all ValidTxnList across all active readers. Each
ValidTxnList has minOpenTxnId.
MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all
currently active readers
This means that no active transaction in the system sees any txn with txnid < X
as open.
This means if construct ValidTxnIdList with HWM=X-1 and use that in
getAcidState(), any files determined by this call as 'obsolete', will be seen
as obsolete by any existing/future reader, i.e. can be physically deleted.
This is also necessary for multi-statement transactions where relying on the
state of Lock Manager is not sufficient. For example
Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open.
13 commits (via it's parent txn) at t2 > t1. (17 is still running).
Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on
Table1/Part1 (17 is still running)
Now delta_13 may be cleaned since it can be seen as obsolete and there may be
no locks on it, i.e. no one is reading it.
Now at t4 > t3 17 may (multi stmt txn) needs to read Table1/Part1. It cannot
use base_14 is that may have absorbed delete events from delete_delta_14.
Using MIN_HISTORY_LEVEL solves this.
See description of HIVE-18747 for more details on MIN_HISTORY_LEVEL
> Make Acid Cleaner use MIN_HISTORY_LEVEL
> ---------------------------------------
>
> Key: HIVE-18772
> URL: https://issues.apache.org/jira/browse/HIVE-18772
> Project: Hive
> Issue Type: Improvement
> Components: Transactions
> Affects Versions: 3.0.0
> Reporter: Eugene Koifman
> Assignee: Eugene Koifman
> Priority: Major
> Attachments: HIVE-18772.01.patch, HIVE-18772.02.patch,
> HIVE-18772.02.patch, HIVE-18772.03.patch
>
>
> Instead of using Lock Manager state as it currently does.
> This will eliminate possible race conditions
> See this
> [comment|https://issues.apache.org/jira/browse/HIVE-18192?focusedCommentId=16338208&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16338208]
> Suppose A is the set of all ValidTxnList across all active readers. Each
> ValidTxnList has minOpenTxnId.
> MIN_HISTORY_LEVEL allows us to determine X = min(minOpenTxnId) across all
> currently active readers
> This means that no active transaction in the system sees any txn with txnid <
> X as open.
> This means if construct ValidTxnIdList with HWM=X-1 and use that in
> getAcidState(), any files determined by this call as 'obsolete', will be seen
> as obsolete by any existing/future reader, i.e. can be physically deleted.
> This is also necessary for multi-statement transactions where relying on the
> state of Lock Manager is not sufficient. For example
> Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open.
> 13 commits (via it's parent txn) at t2 > t1. (17 is still running).
> Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on
> Table1/Part1 (17 is still running)
> Now delta_13 may be cleaned since it can be seen as obsolete and there may be
> no locks on it, i.e. no one is reading it.
> Now at t4 > t3 17 may (multi stmt txn) needs to read Table1/Part1. It cannot
> use base_14 is that may have absorbed delete events from delete_delta_14.
> Another Use Case
> There is delta_1_1 and delta_2_2 on disk both created by committed txns.
> T5 starts reading these. At the same time compactor creates delta_1_2.
> Now Cleaner sees delta_1_1 and delta_1_2 as obsolete and may remove them
> while the read is still in progress. This is because Compactor itself is not
> running in a txn and the files that
> it produces are visible immediately. If it ran in a txn, the new files would
> only be visible once
> this txn is visible to others (including the Cleaner).
> Using MIN_HISTORY_LEVEL solves this.
> See description of HIVE-18747 for more details on MIN_HISTORY_LEVEL
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)