[ 
https://issues.apache.org/jira/browse/TRAFODION-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hans Zeller updated TRAFODION-2965:
-----------------------------------
    Description: 
Here is a test case that demonstrates this:
{noformat}
update statistics for table hive.hive.time_dim on every column;
control query shape groupby(exchange(groupby(exchange(groupby(scan)))));
prepare s from
select count(distinct t_time_id) from hive.hive.time_dim; 
explain options 'f' s;
execute s;
get statistics for qid current default;
{noformat}
The actual statistics show a "0" as the row count (ActRowsUsed) for the lowest 
EX_HASH_GRBY with id 2:
{noformat}
   LC   RC   Id PaId ExId Frag TDB Name                   DOP   Dispatches      
  OperCpuTime        EstRowsUsed        ActRowsUsed        ActDataUsed    
Details

   13    .   14    .    8    0 EX_ROOT                      1            2      
           37                  0                  1                  8 3658
   12    .   13   14    7    0 EX_SORT_GRBY                 1            5      
           61                  1                  1                  8
   11    .   12   13    6    0 EX_SPLIT_TOP                 1            8      
          171                  1                  4                 32
   10    .   11   12    6    0 EX_SEND_TOP                  4           16      
        3,389                  1                  4                 64
    9    .   10   11    6    2 EX_SEND_BOTTOM               4           12      
          683                  1                  4                 64
    8    .    9   10    6    2 EX_SPLIT_BOTTOM              4           19      
          597                  1                  4                 32 833874
    7    .    8    9    5    2 EX_SORT_GRBY                 4        2,259      
      289,200                  1                  4                 32
    6    .    7    8    4    2 EX_HASH_GRBY                 4        2,259      
      394,235               1451             86,400      2,765,059,200 0|0|0
    5    .    6    7    3    2 EX_SPLIT_TOP                 4        2,211      
       51,034             110007             86,400      2,765,059,200
    4    .    5    6    3    2 EX_SEND_TOP                  8        2,436      
       98,125             110007             86,400      2,766,528,000
    3    .    4    5    3    3 EX_SEND_BOTTOM               8       10,658      
      196,714             110007             86,400      2,766,528,000
    2    .    3    4    3    3 EX_SPLIT_BOTTOM              2        2,663      
       95,246             110007             86,400      2,765,059,200 1547521
    1    .    2    3    2    3 EX_HASH_GRBY                 2        4,521      
      303,319            55003.5                  0                  0
    .    .    1    2    1    3 EX_HDFS_SCAN                 2        2,650      
      952,242             116085             86,400      2,765,059,200 
HIVE.HIVE.TIME_DIM|86400|5288324
{noformat}
The reason is that the hash groupby reports its row count in the BMO stats. 
However, a partial hash groupby is not considered a BMO (Big Memory Operator), 
so no rowcount gets reported. The fix is to increment the rowcount in the 
generic stats entry that is present in both partial and full groupby operators.
  

  was:
Here is a test case that demonstrates this:
{noformat}
update statistics for table hive.hive.time_dim on every column;
prepare s from
select count(distinct t_time_id) from hive.hive.time_dim; 
explain options 'f' s;
execute s;
get statistics for qid current default;
{noformat}
The actual statistics show a "0" as the row count (ActRowsUsed) for the lowest 
EX_HASH_GRBY with id 2:
{noformat}
   LC   RC   Id PaId ExId Frag TDB Name                   DOP   Dispatches      
  OperCpuTime        EstRowsUsed        ActRowsUsed        ActDataUsed    
Details

   13    .   14    .    8    0 EX_ROOT                      1            2      
           37                  0                  1                  8 3658
   12    .   13   14    7    0 EX_SORT_GRBY                 1            5      
           61                  1                  1                  8
   11    .   12   13    6    0 EX_SPLIT_TOP                 1            8      
          171                  1                  4                 32
   10    .   11   12    6    0 EX_SEND_TOP                  4           16      
        3,389                  1                  4                 64
    9    .   10   11    6    2 EX_SEND_BOTTOM               4           12      
          683                  1                  4                 64
    8    .    9   10    6    2 EX_SPLIT_BOTTOM              4           19      
          597                  1                  4                 32 833874
    7    .    8    9    5    2 EX_SORT_GRBY                 4        2,259      
      289,200                  1                  4                 32
    6    .    7    8    4    2 EX_HASH_GRBY                 4        2,259      
      394,235               1451             86,400      2,765,059,200 0|0|0
    5    .    6    7    3    2 EX_SPLIT_TOP                 4        2,211      
       51,034             110007             86,400      2,765,059,200
    4    .    5    6    3    2 EX_SEND_TOP                  8        2,436      
       98,125             110007             86,400      2,766,528,000
    3    .    4    5    3    3 EX_SEND_BOTTOM               8       10,658      
      196,714             110007             86,400      2,766,528,000
    2    .    3    4    3    3 EX_SPLIT_BOTTOM              2        2,663      
       95,246             110007             86,400      2,765,059,200 1547521
    1    .    2    3    2    3 EX_HASH_GRBY                 2        4,521      
      303,319            55003.5                  0                  0
    .    .    1    2    1    3 EX_HDFS_SCAN                 2        2,650      
      952,242             116085             86,400      2,765,059,200 
HIVE.HIVE.TIME_DIM|86400|5288324
{noformat}
The reason is that the hash groupby reports its row count in the BMO stats. 
However, a partial hash groupby is not considered a BMO (Big Memory Operator), 
so no rowcount gets reported. The fix is to increment the rowcount in the 
generic stats entry that is present in both partial and full groupby operators.
  


> Hash partial groupby does not report a row count in operator statistics
> -----------------------------------------------------------------------
>
>                 Key: TRAFODION-2965
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2965
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-exe
>    Affects Versions: 2.0-incubating
>         Environment: any
>            Reporter: Hans Zeller
>            Assignee: Hans Zeller
>            Priority: Major
>             Fix For: 2.4
>
>
> Here is a test case that demonstrates this:
> {noformat}
> update statistics for table hive.hive.time_dim on every column;
> control query shape groupby(exchange(groupby(exchange(groupby(scan)))));
> prepare s from
> select count(distinct t_time_id) from hive.hive.time_dim; 
> explain options 'f' s;
> execute s;
> get statistics for qid current default;
> {noformat}
> The actual statistics show a "0" as the row count (ActRowsUsed) for the 
> lowest EX_HASH_GRBY with id 2:
> {noformat}
>    LC   RC   Id PaId ExId Frag TDB Name                   DOP   Dispatches    
>     OperCpuTime        EstRowsUsed        ActRowsUsed        ActDataUsed    
> Details
>    13    .   14    .    8    0 EX_ROOT                      1            2    
>              37                  0                  1                  8 3658
>    12    .   13   14    7    0 EX_SORT_GRBY                 1            5    
>              61                  1                  1                  8
>    11    .   12   13    6    0 EX_SPLIT_TOP                 1            8    
>             171                  1                  4                 32
>    10    .   11   12    6    0 EX_SEND_TOP                  4           16    
>           3,389                  1                  4                 64
>     9    .   10   11    6    2 EX_SEND_BOTTOM               4           12    
>             683                  1                  4                 64
>     8    .    9   10    6    2 EX_SPLIT_BOTTOM              4           19    
>             597                  1                  4                 32 
> 833874
>     7    .    8    9    5    2 EX_SORT_GRBY                 4        2,259    
>         289,200                  1                  4                 32
>     6    .    7    8    4    2 EX_HASH_GRBY                 4        2,259    
>         394,235               1451             86,400      2,765,059,200 0|0|0
>     5    .    6    7    3    2 EX_SPLIT_TOP                 4        2,211    
>          51,034             110007             86,400      2,765,059,200
>     4    .    5    6    3    2 EX_SEND_TOP                  8        2,436    
>          98,125             110007             86,400      2,766,528,000
>     3    .    4    5    3    3 EX_SEND_BOTTOM               8       10,658    
>         196,714             110007             86,400      2,766,528,000
>     2    .    3    4    3    3 EX_SPLIT_BOTTOM              2        2,663    
>          95,246             110007             86,400      2,765,059,200 
> 1547521
>     1    .    2    3    2    3 EX_HASH_GRBY                 2        4,521    
>         303,319            55003.5                  0                  0
>     .    .    1    2    1    3 EX_HDFS_SCAN                 2        2,650    
>         952,242             116085             86,400      2,765,059,200 
> HIVE.HIVE.TIME_DIM|86400|5288324
> {noformat}
> The reason is that the hash groupby reports its row count in the BMO stats. 
> However, a partial hash groupby is not considered a BMO (Big Memory 
> Operator), so no rowcount gets reported. The fix is to increment the rowcount 
> in the generic stats entry that is present in both partial and full groupby 
> operators.
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to