[
https://issues.apache.org/jira/browse/HIVE-19820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534343#comment-16534343
]
Sergey Shelukhin edited comment on HIVE-19820 at 7/6/18 1:03 AM:
-----------------------------------------------------------------
Upon some consideration and reading ACID code, I don't think doing the pre-set
in the beginning is going to be bulletproof; although it's possible to fall
back by doing an additional check for parallel writes, it's becoming ugly
(someone can insert a write ID as we are checking).
The ideal approach would be to take exclusive lock for a bit (and abandon or
delay analyze if we cannot get it due to parallel writes), set a known pending
state to stats, then downgrade back to shared lock, but ACID lock APIs don't
support atomic downgrade or timeouts/try options.
I think I'll augment the save-time check for analyze specifically to look at
any write IDs committed before analyze, that either affect the same
TXN_COMPONENTS, or have no TXN_COMPONENTS due to cleanup (for safety). If
something like that happened stats will not be updated. Otherwise stats will be
overwritten regardless of the current state.
This would have been much better if we got rid of JSON and made sure ANY change
to ACID stats sets write ID, even if the change is to invalid stats, and noone
ever uses the flag. The latter is a big backward compat issue, so it may
happen in phase 2...
Also if we had multi version stats using the base-delta model, just like data,
this whole issue would have been moot.
was (Author: sershe):
Upon some consideration and reading ACID code, I don't think doing the pre-set
in the beginning is going to be bulletproof; although it's possible to fall
back by doing an additional check for parallel writes, it's becoming ugly
(someone can insert a write ID as we are checking).
The ideal approach would be to take exclusive lock for a bit (and abandon or
stop analyze if we cannot get it due to parallel writes), set a known pending
state to stats, then downgrade back to shared lock, but ACID lock APIs don't
support atomic downgrade or timeouts/try options.
I think I'll augment the save-time check for analyze specifically to look at
any write IDs committed before analyze, that either affect the same
TXN_COMPONENTS, or have no TXN_COMPONENTS due to cleanup (for safety). If
something like that happened stats will not be updated. Otherwise stats will be
overwritten regardless of the current state.
This would have been much better if we got rid of JSON and made sure ANY change
to ACID stats sets write ID, even if the change is to invalid stats, and noone
ever uses the flag. The latter is a big backward compat issue, so it may
happen in phase 2...
Also if we had multi version stats using the base-delta model, just like data,
this whole issue would have been moot.
> add ACID stats support to background stats updater
> --------------------------------------------------
>
> Key: HIVE-19820
> URL: https://issues.apache.org/jira/browse/HIVE-19820
> Project: Hive
> Issue Type: Sub-task
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Priority: Major
> Attachments: HIVE-19820.01-master-txnstats.patch,
> HIVE-19820.02-master-txnstats.patch, HIVE-19820.03-master-txnstats.patch,
> HIVE-19820.04-master-txnstats.patch
>
>
> Follow-up from HIVE-19418.
> Right now it checks whether stats are valid in an old-fashioned way... and
> also gets ACID state, and discards it without using.
> When ACID stats are implemented, ACID state needs to be used to do
> version-aware valid stats checks.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)