[
https://issues.apache.org/jira/browse/HIVE-19867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519787#comment-16519787
]
Steve Yeom commented on HIVE-19867:
-----------------------------------
Sergey and I talked about this. He mentioned several cases from the perspective
of Readers like:
Let's assume the txnId of a stats entity(table or partition) is 1 and its write
id on table1 is 11.
Then we may have
case1: Suppose concurrent writes 12 (with txnId 2) and 13 (txnId 3) and
concurrent reader 14 (txnId 4).
Here concurrent reader 14 has 12,13 as open writes in its writeIdList.
1) Write 12 comes and updates the stats of table1 and its transaction is
committed.
So now txnid of the stats is txnid is 2.
2) Then write 13 comes in and checks itself (number 13) from stats's
writeIdList by txnId 2.
13 should be there in the writeIdList. So it detects concurrent writes and
can turn the flag
(COLUMN_STATS_ACCURATE) off.
3) Now reader comes in and finds the stats is not valid by simply checking
the flag.
(the reader also can determine the stats' validity by comparing
writeIdLists of itself and the stats)
case2: Suppose concurrent writes 12 and 13. But assume we have a reader 14
(txnId 4) that started its
transaction after writes 12 and 13 are done.
If the flag is still on and the txnId in TBLS/PARTITIONS is 3, then reader 14
does not have a way
to figure out the stats are invalid due to concurrent writes since its own
writeIdList for table1 does not have
12, 13 as open writes and both are committed.
> Test and verify Concurrent INSERTS
> ------------------------------------
>
> Key: HIVE-19867
> URL: https://issues.apache.org/jira/browse/HIVE-19867
> Project: Hive
> Issue Type: Sub-task
> Components: Transactions
> Affects Versions: 4.0.0
> Reporter: Steve Yeom
> Assignee: Steve Yeom
> Priority: Major
> Fix For: 4.0.0
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)