[jira] [Comment Edited] (HIVE-19820) add ACID stats support to background stats updater

Sergey Shelukhin (JIRA) Thu, 05 Jul 2018 18:04:22 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-19820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534343#comment-16534343
 ]


Sergey Shelukhin edited comment on HIVE-19820 at 7/6/18 1:03 AM:
-----------------------------------------------------------------

Upon some consideration and reading ACID code, I don't think doing the pre-set 
in the beginning is going to be bulletproof; although it's possible to fall 
back by doing an additional check for parallel writes, it's becoming ugly 
(someone can insert a write ID as we are checking).
The ideal approach would be to take exclusive lock for a bit (and abandon or 
delay analyze if we cannot get it due to parallel writes), set a known pending 
state to stats, then downgrade back to shared lock, but ACID lock APIs don't 
support atomic downgrade or timeouts/try options.
I think I'll augment the save-time check for analyze specifically to look at 
any write IDs committed before analyze, that either affect the same 
TXN_COMPONENTS, or have no TXN_COMPONENTS due to cleanup (for safety). If 
something like that happened stats will not be updated. Otherwise stats will be 
overwritten regardless of the current state.
This would have been much better if we got rid of JSON and made sure ANY change 
to ACID stats sets write ID, even if the change is to invalid stats, and noone 
ever uses the flag. The latter is a big backward compat issue, so it  may 
happen in phase 2... 
Also if we had multi version stats using the base-delta model, just like data, 
this whole issue would have been moot.



was (Author: sershe):
Upon some consideration and reading ACID code, I don't think doing the pre-set 
in the beginning is going to be bulletproof; although it's possible to fall 
back by doing an additional check for parallel writes, it's becoming ugly 
(someone can insert a write ID as we are checking).
The ideal approach would be to take exclusive lock for a bit (and abandon or 
stop analyze if we cannot get it due to parallel writes), set a known pending 
state to stats, then downgrade back to shared lock, but ACID lock APIs don't 
support atomic downgrade or timeouts/try options.
I think I'll augment the save-time check for analyze specifically to look at 
any write IDs committed before analyze, that either affect the same 
TXN_COMPONENTS, or have no TXN_COMPONENTS due to cleanup (for safety). If 
something like that happened stats will not be updated. Otherwise stats will be 
overwritten regardless of the current state.
This would have been much better if we got rid of JSON and made sure ANY change 
to ACID stats sets write ID, even if the change is to invalid stats, and noone 
ever uses the flag. The latter is a big backward compat issue, so it  may 
happen in phase 2... 
Also if we had multi version stats using the base-delta model, just like data, 
this whole issue would have been moot.


> add ACID stats support to background stats updater
> --------------------------------------------------
>
>                 Key: HIVE-19820
>                 URL: https://issues.apache.org/jira/browse/HIVE-19820
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Major
>         Attachments: HIVE-19820.01-master-txnstats.patch, 
> HIVE-19820.02-master-txnstats.patch, HIVE-19820.03-master-txnstats.patch, 
> HIVE-19820.04-master-txnstats.patch
>
>
> Follow-up from HIVE-19418.
> Right now it checks whether stats are valid in an old-fashioned way... and 
> also gets ACID state, and discards it without using.
> When ACID stats are implemented, ACID state needs to be used to do 
> version-aware valid stats checks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-19820) add ACID stats support to background stats updater

Reply via email to