[
https://issues.apache.org/jira/browse/TRAFODION-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hans Zeller updated TRAFODION-2965:
-----------------------------------
Description:
Here is a test case that demonstrates this:
{noformat}
update statistics for table hive.hive.time_dim on every column;
control query shape groupby(exchange(groupby(exchange(groupby(scan)))));
prepare s from
select count(distinct t_time_id) from hive.hive.time_dim;
explain options 'f' s;
execute s;
get statistics for qid current default;
{noformat}
The actual statistics show a "0" as the row count (ActRowsUsed) for the lowest
EX_HASH_GRBY with id 2:
{noformat}
LC RC Id PaId ExId Frag TDB Name DOP Dispatches
OperCpuTime EstRowsUsed ActRowsUsed ActDataUsed
Details
13 . 14 . 8 0 EX_ROOT 1 2
37 0 1 8 3658
12 . 13 14 7 0 EX_SORT_GRBY 1 5
61 1 1 8
11 . 12 13 6 0 EX_SPLIT_TOP 1 8
171 1 4 32
10 . 11 12 6 0 EX_SEND_TOP 4 16
3,389 1 4 64
9 . 10 11 6 2 EX_SEND_BOTTOM 4 12
683 1 4 64
8 . 9 10 6 2 EX_SPLIT_BOTTOM 4 19
597 1 4 32 833874
7 . 8 9 5 2 EX_SORT_GRBY 4 2,259
289,200 1 4 32
6 . 7 8 4 2 EX_HASH_GRBY 4 2,259
394,235 1451 86,400 2,765,059,200 0|0|0
5 . 6 7 3 2 EX_SPLIT_TOP 4 2,211
51,034 110007 86,400 2,765,059,200
4 . 5 6 3 2 EX_SEND_TOP 8 2,436
98,125 110007 86,400 2,766,528,000
3 . 4 5 3 3 EX_SEND_BOTTOM 8 10,658
196,714 110007 86,400 2,766,528,000
2 . 3 4 3 3 EX_SPLIT_BOTTOM 2 2,663
95,246 110007 86,400 2,765,059,200 1547521
1 . 2 3 2 3 EX_HASH_GRBY 2 4,521
303,319 55003.5 0 0
. . 1 2 1 3 EX_HDFS_SCAN 2 2,650
952,242 116085 86,400 2,765,059,200
HIVE.HIVE.TIME_DIM|86400|5288324
{noformat}
The reason is that the hash groupby reports its row count in the BMO stats.
However, a partial hash groupby is not considered a BMO (Big Memory Operator),
so no rowcount gets reported. The fix is to increment the rowcount in the
generic stats entry that is present in both partial and full groupby operators.
was:
Here is a test case that demonstrates this:
{noformat}
update statistics for table hive.hive.time_dim on every column;
prepare s from
select count(distinct t_time_id) from hive.hive.time_dim;
explain options 'f' s;
execute s;
get statistics for qid current default;
{noformat}
The actual statistics show a "0" as the row count (ActRowsUsed) for the lowest
EX_HASH_GRBY with id 2:
{noformat}
LC RC Id PaId ExId Frag TDB Name DOP Dispatches
OperCpuTime EstRowsUsed ActRowsUsed ActDataUsed
Details
13 . 14 . 8 0 EX_ROOT 1 2
37 0 1 8 3658
12 . 13 14 7 0 EX_SORT_GRBY 1 5
61 1 1 8
11 . 12 13 6 0 EX_SPLIT_TOP 1 8
171 1 4 32
10 . 11 12 6 0 EX_SEND_TOP 4 16
3,389 1 4 64
9 . 10 11 6 2 EX_SEND_BOTTOM 4 12
683 1 4 64
8 . 9 10 6 2 EX_SPLIT_BOTTOM 4 19
597 1 4 32 833874
7 . 8 9 5 2 EX_SORT_GRBY 4 2,259
289,200 1 4 32
6 . 7 8 4 2 EX_HASH_GRBY 4 2,259
394,235 1451 86,400 2,765,059,200 0|0|0
5 . 6 7 3 2 EX_SPLIT_TOP 4 2,211
51,034 110007 86,400 2,765,059,200
4 . 5 6 3 2 EX_SEND_TOP 8 2,436
98,125 110007 86,400 2,766,528,000
3 . 4 5 3 3 EX_SEND_BOTTOM 8 10,658
196,714 110007 86,400 2,766,528,000
2 . 3 4 3 3 EX_SPLIT_BOTTOM 2 2,663
95,246 110007 86,400 2,765,059,200 1547521
1 . 2 3 2 3 EX_HASH_GRBY 2 4,521
303,319 55003.5 0 0
. . 1 2 1 3 EX_HDFS_SCAN 2 2,650
952,242 116085 86,400 2,765,059,200
HIVE.HIVE.TIME_DIM|86400|5288324
{noformat}
The reason is that the hash groupby reports its row count in the BMO stats.
However, a partial hash groupby is not considered a BMO (Big Memory Operator),
so no rowcount gets reported. The fix is to increment the rowcount in the
generic stats entry that is present in both partial and full groupby operators.
> Hash partial groupby does not report a row count in operator statistics
> -----------------------------------------------------------------------
>
> Key: TRAFODION-2965
> URL: https://issues.apache.org/jira/browse/TRAFODION-2965
> Project: Apache Trafodion
> Issue Type: Bug
> Components: sql-exe
> Affects Versions: 2.0-incubating
> Environment: any
> Reporter: Hans Zeller
> Assignee: Hans Zeller
> Priority: Major
> Fix For: 2.4
>
>
> Here is a test case that demonstrates this:
> {noformat}
> update statistics for table hive.hive.time_dim on every column;
> control query shape groupby(exchange(groupby(exchange(groupby(scan)))));
> prepare s from
> select count(distinct t_time_id) from hive.hive.time_dim;
> explain options 'f' s;
> execute s;
> get statistics for qid current default;
> {noformat}
> The actual statistics show a "0" as the row count (ActRowsUsed) for the
> lowest EX_HASH_GRBY with id 2:
> {noformat}
> LC RC Id PaId ExId Frag TDB Name DOP Dispatches
> OperCpuTime EstRowsUsed ActRowsUsed ActDataUsed
> Details
> 13 . 14 . 8 0 EX_ROOT 1 2
> 37 0 1 8 3658
> 12 . 13 14 7 0 EX_SORT_GRBY 1 5
> 61 1 1 8
> 11 . 12 13 6 0 EX_SPLIT_TOP 1 8
> 171 1 4 32
> 10 . 11 12 6 0 EX_SEND_TOP 4 16
> 3,389 1 4 64
> 9 . 10 11 6 2 EX_SEND_BOTTOM 4 12
> 683 1 4 64
> 8 . 9 10 6 2 EX_SPLIT_BOTTOM 4 19
> 597 1 4 32
> 833874
> 7 . 8 9 5 2 EX_SORT_GRBY 4 2,259
> 289,200 1 4 32
> 6 . 7 8 4 2 EX_HASH_GRBY 4 2,259
> 394,235 1451 86,400 2,765,059,200 0|0|0
> 5 . 6 7 3 2 EX_SPLIT_TOP 4 2,211
> 51,034 110007 86,400 2,765,059,200
> 4 . 5 6 3 2 EX_SEND_TOP 8 2,436
> 98,125 110007 86,400 2,766,528,000
> 3 . 4 5 3 3 EX_SEND_BOTTOM 8 10,658
> 196,714 110007 86,400 2,766,528,000
> 2 . 3 4 3 3 EX_SPLIT_BOTTOM 2 2,663
> 95,246 110007 86,400 2,765,059,200
> 1547521
> 1 . 2 3 2 3 EX_HASH_GRBY 2 4,521
> 303,319 55003.5 0 0
> . . 1 2 1 3 EX_HDFS_SCAN 2 2,650
> 952,242 116085 86,400 2,765,059,200
> HIVE.HIVE.TIME_DIM|86400|5288324
> {noformat}
> The reason is that the hash groupby reports its row count in the BMO stats.
> However, a partial hash groupby is not considered a BMO (Big Memory
> Operator), so no rowcount gets reported. The fix is to increment the rowcount
> in the generic stats entry that is present in both partial and full groupby
> operators.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)