[jira] [Comment Edited] (HIVE-18395) Using stats for aggregate query on Acid/MM is off even with "hive.compute.query.using.stats" is true.

Sergey Shelukhin (JIRA) Thu, 03 May 2018 13:34:05 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-18395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463006#comment-16463006
 ]


Sergey Shelukhin edited comment on HIVE-18395 at 5/3/18 8:32 PM:
-----------------------------------------------------------------

Some small comments:
1) Seems to include some generated config in the patch.
2) TABLE_PARAMS_TXN - can we normalize this and not rely on JSON object? Since 
this will only be used for stats I assume, and stats only have one parameter 
(JSON string), we can store it as proper SQL data
[~ekoifman] you might want to look at the new tables schema since it's based on 
what ACID state is sufficient to map stats to a snapshot.
3) {noformat}
+      if (SessionState.get().getTxnMgr() != null && isTransactional) {
+        request.setTxnid(SessionState.get().getTxnMgr().getCurrentTxnId());
+      }
{noformat}
Is it enough to store transaction ID? Transaction ID will invalidate stats for 
every other table with every transaction, right?
Also the doc calls for storing entire state information about when the stats 
were written - otherwise you cannot tell based on read-time snapshot if the 
stats are valid, for example in case of two inserts, where both can write 
stats, but both stats are invalid because the inserts don't see each other. 
4) {noformat}
+        /*
         if (isFullAcid && !work.isTargetRewritten()) {
           // Don't bother with aggregation in this case, it will probably be 
invalid.
           parameters.remove(statType);
           continue;
         }
+        */
{noformat}
Aggregation for full ACID will still be invalid.
5) This doesn't seem to affect basic stats in all the places they are updated. 
However, basic stats are also used for count(*) type queries (numRows). 
I sent a patch separately that has some changes for basic stats... Might make 
sense to also modify these places in this patch. Doesn't have to be the same 
change I made as long as it is sufficient to add txn info to basic stats.

It might help to exclude the generated code for review. I sometimes use the 
script like this, where $1 is base branch and $2 is file name:
{noformat}
rm -f  ~/patches/$2.nogen.patch
for f in `git diff $1 --name-only | grep -v "gen-" | grep -v "\/gen\/"`
do
  git diff $1 -- $f >> ~/patches/$2.nogen.patch
{noformat}


was (Author: sershe):
Some small comments:
1) Seems to include some generated config in the patch.
2) TABLE_PARAMS_TXN - can we normalize this and not rely on JSON object? Since 
this will only be used for stats I assume, and stats only have one parameter.
[~ekoifman] you might want to look at the new tables schema since it's based on 
what ACID state is sufficient to map stats to a snapshot.
3) {noformat}
+      if (SessionState.get().getTxnMgr() != null && isTransactional) {
+        request.setTxnid(SessionState.get().getTxnMgr().getCurrentTxnId());
+      }
{noformat}
Is it enough to store transaction ID? Transaction ID will invalidate stats for 
every other table with every transaction, right?
Also the doc calls for storing entire state information about when the stats 
were written - otherwise you cannot tell based on read-time snapshot if the 
stats are valid, for example in case of two inserts, where both can write 
stats, but both stats are invalid because the inserts don't see each other. 
4) {noformat}
+        /*
         if (isFullAcid && !work.isTargetRewritten()) {
           // Don't bother with aggregation in this case, it will probably be 
invalid.
           parameters.remove(statType);
           continue;
         }
+        */
{noformat}
Aggregation for full ACID will still be invalid.
5) This doesn't seem to affect basic stats in all the places they are updated. 
However, basic stats are also used for count(*) type queries (numRows). 
I sent a patch separately that has some changes for basic stats... Might make 
sense to also modify these places in this patch. Doesn't have to be the same 
change I made as long as it is sufficient to add txn info to basic stats.

It might help to exclude the generated code for review. I sometimes use the 
script like this, where $1 is base branch and $2 is file name:
{noformat}
rm -f  ~/patches/$2.nogen.patch
for f in `git diff $1 --name-only | grep -v "gen-" | grep -v "\/gen\/"`
do
  git diff $1 -- $f >> ~/patches/$2.nogen.patch
{noformat}

> Using stats for aggregate query on Acid/MM is off even with 
> "hive.compute.query.using.stats" is true.
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-18395
>                 URL: https://issues.apache.org/jira/browse/HIVE-18395
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: Steve Yeom
>            Assignee: Steve Yeom
>            Priority: Major
>         Attachments: HIVE-18395.01.preview
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-18395) Using stats for aggregate query on Acid/MM is off even with "hive.compute.query.using.stats" is true.

Reply via email to