[
https://issues.apache.org/jira/browse/HIVE-19867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16522994#comment-16522994
]
Sergey Shelukhin commented on HIVE-19867:
-----------------------------------------
We were discussing the partition case with [~ekoifman].
Tangentially based on that, I don't think we need this multi insert detection
with current code.
We already have valid write ID list "isEquivalent" check, so after multiple
inserts in parallel, it doesn't matter who writes stats last, it will simply
become not isEquivalent, so no extra checks are needed.
Can you describe a scenario where reader gets invalid stats with concurrent
writers (i.e. where isEquivalent will return true but stats are still
invalid?). From the above I cannot see it happening.
However Eugene was suggesting that we actually redo the whole stats correctness
to rely mostly on write path, in that case this approach (or rather similar
more comprehensive one that handles couple more special cases) will help.
Actually we may not even need to store write ID list and txn in that case, only
the last write ID. But we'd also need to ensure that every query affecting data
affects stats, either by updating them, or by removing the flag/write ID
(including queries with stats collection disabled, alters, etc.).
I'll send an email with details to discuss.
> Test and verify Concurrent INSERTS
> ------------------------------------
>
> Key: HIVE-19867
> URL: https://issues.apache.org/jira/browse/HIVE-19867
> Project: Hive
> Issue Type: Sub-task
> Components: Transactions
> Affects Versions: 4.0.0
> Reporter: Steve Yeom
> Assignee: Steve Yeom
> Priority: Major
> Fix For: 4.0.0
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)