Re: Review Request 29878: Bug with max() together with rank() and grouping sets

Ashutosh Chauhan Fri, 16 Jan 2015 10:21:06 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29878/#review68438
-----------------------------------------------------------



overall looks good. Few minor comments.


ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
<https://reviews.apache.org/r/29878/#comment112662>

    can be marked as transient.



ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
<https://reviews.apache.org/r/29878/#comment112661>

    this come from desc, can be marked as transient.



ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
<https://reviews.apache.org/r/29878/#comment112658>

    this comes from Desc, can be marked as transient.



ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
<https://reviews.apache.org/r/29878/#comment112659>

    this also comes from desc, can be marked as transient.



ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
<https://reviews.apache.org/r/29878/#comment112663>

    It will be good to add comments about what these variables represents.



ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
<https://reviews.apache.org/r/29878/#comment112660>

    This one too.



ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
<https://reviews.apache.org/r/29878/#comment112647>

    This can be marked as transient too, since its initialized only in 
initializeOp() which is called on backend.



ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
<https://reviews.apache.org/r/29878/#comment112664>

    Can you add a comment, about this logic. Not obvious on cursory reading.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
<https://reviews.apache.org/r/29878/#comment112666>

    transient



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
<https://reviews.apache.org/r/29878/#comment112668>

    comment about this logic.



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
<https://reviews.apache.org/r/29878/#comment112670>

    nit: whitespace at EOL.



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
<https://reviews.apache.org/r/29878/#comment112671>

    nit: whitespace at EOL.



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
<https://reviews.apache.org/r/29878/#comment112672>

    nit: whitespace at EOL.



ql/src/test/results/clientpositive/infer_bucket_sort_grouping_operators.q.out
<https://reviews.apache.org/r/29878/#comment112673>

    should this have been only key,value ?



ql/src/test/results/clientpositive/infer_bucket_sort_grouping_operators.q.out
<https://reviews.apache.org/r/29878/#comment112675>

    seems like agg column should not be present there? Can you check?


- Ashutosh Chauhan


On Jan. 14, 2015, 9:13 a.m., Navis Ryu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/29878/
> -----------------------------------------------------------
> 
> (Updated Jan. 14, 2015, 9:13 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-9347
>     https://issues.apache.org/jira/browse/HIVE-9347
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> It looks like the query below returns incorrect results on Hive 0.13.1, but 
> it was working fine on Hive 0.11. 
> 
> I have the following table:
> CREATE  TABLE `t`(
>   `category` int, 
>   `live` int, 
>   `comments` int)
> 
> with the following data:
> hive> select * from t;
> OK
> 3       0       2
> 2       0       2
> 8       0       2
> 
> The query:
> hive> select category, max(live) live, max(comments) comments, rank() OVER 
> (PARTITION BY category ORDER BY comments) rank1
> FROM t
> GROUP BY category
> GROUPING SETS ((), (category))
> HAVING max(comments) > 0;
> 
> return the following results:
> 
> NULL    1       48      1
> 2       1       49      1
> 3       1       49      1
> 8       1       49      1
> 
> When using grouping sets with the rank() function the max() function return 
> incorrect results. Everything works fine if I remove grouping sets clause and 
> split the query into two independent queries or remove the rank() function.
> 
> This looks like a bug to me but please review. That said, I'm not sure if 
> it's just Amazon issue or general Hive issue.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 4632f08 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java 
> 90b4b12 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 
> afd1738 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cea86df 
>   ql/src/test/queries/clientpositive/groupby_grouping_window.q PRE-CREATION 
>   ql/src/test/results/clientpositive/annotate_stats_groupby.q.out 89dd1de 
>   ql/src/test/results/clientpositive/annotate_stats_groupby2.q.out c3bf0d8 
>   ql/src/test/results/clientpositive/groupby_cube1.q.out 4f44d4f 
>   ql/src/test/results/clientpositive/groupby_grouping_sets2.q.out 0a21dbe 
>   ql/src/test/results/clientpositive/groupby_grouping_sets3.q.out 3597609 
>   ql/src/test/results/clientpositive/groupby_grouping_sets4.q.out d1be46d 
>   ql/src/test/results/clientpositive/groupby_grouping_sets5.q.out 6d11add 
>   ql/src/test/results/clientpositive/groupby_grouping_sets6.q.out d2ff112 
>   ql/src/test/results/clientpositive/groupby_rollup1.q.out 0108ce0 
>   
> ql/src/test/results/clientpositive/infer_bucket_sort_grouping_operators.q.out 
> 301b90c 
> 
> Diff: https://reviews.apache.org/r/29878/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Navis Ryu
> 
>

Re: Review Request 29878: Bug with max() together with rank() and grouping sets

Reply via email to