[ 
https://issues.apache.org/jira/browse/HIVE-19867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16522994#comment-16522994
 ] 

Sergey Shelukhin commented on HIVE-19867:
-----------------------------------------

We were discussing the partition case with [~ekoifman].
Tangentially based on that, I don't think we need this multi insert detection 
with current code.
We already have valid write ID list "isEquivalent" check, so after multiple 
inserts in parallel, it doesn't matter who writes stats last, it will simply 
become not isEquivalent, so no extra checks are needed.
Can you describe a scenario where reader gets invalid stats with concurrent 
writers (i.e. where isEquivalent will return true but stats are still 
invalid?). From the above I cannot see it happening.

However Eugene was suggesting that we actually redo the whole stats correctness 
to rely mostly on write path, in that case this approach (or rather similar 
more comprehensive one that handles couple more special cases) will help.
Actually we may not even need to store write ID list and txn in that case, only 
the last write ID. But we'd also need to ensure that every query affecting data 
affects stats, either by updating them, or by removing the flag/write ID 
(including queries with stats collection disabled, alters, etc.). 
I'll send an email with details to discuss.

> Test and verify Concurrent INSERTS  
> ------------------------------------
>
>                 Key: HIVE-19867
>                 URL: https://issues.apache.org/jira/browse/HIVE-19867
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Transactions
>    Affects Versions: 4.0.0
>            Reporter: Steve Yeom
>            Assignee: Steve Yeom
>            Priority: Major
>             Fix For: 4.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to