[ 
https://issues.apache.org/jira/browse/HIVE-10874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14568220#comment-14568220
 ] 

Jesus Camacho Rodriguez commented on HIVE-10874:
------------------------------------------------

[~jpullokkaran], this problem is not only in Hive, the patch should go into 
Calcite too, and once the next release is out, we could remove it from here.

In this case, the condition is risen because we have the following plan:
{noformat}
Aggregate (f_1, sum(f_1)) 
  Union
    Aggregate (x, sum(x)) ...
    Aggregate (x, sum(x))  ...
{noformat}
where f1 is the column with the result of sum(x).

The problem is that Calcite derives the row schema for the aggregation column 
sum(f1) automatically. The generated name is f_1 ('f' of function, 1 of the 
position in the tuple), which is the same one that the first column has; 
however, Calcite was not verifying if the autogenerated name was already in the 
tuple or not. This patch checks if the name already exists, and while it does, 
it generates a new column name.



> Fail in TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2.q due to 
> duplicate column name
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-10874
>                 URL: https://issues.apache.org/jira/browse/HIVE-10874
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>         Attachments: HIVE-10874.patch
>
>
> Aggregate operators may derive row types with duplicate column names. The 
> reason is that the column names for grouping sets columns and aggregation 
> columns might be generated automatically, but we do not check whether the 
> column name already exists in the same row.
> This error can be reproduced by 
> TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2.q, which fails 
> with the following trace:
> {code}
> junit.framework.AssertionFailedError: Unexpected exception 
> java.lang.AssertionError: RecordType(BIGINT $f1, BIGINT $f1)
>       at org.apache.calcite.rel.core.Project.isValid(Project.java:200)
>       at org.apache.calcite.rel.core.Project.<init>(Project.java:85)
>       at org.apache.calcite.rel.core.Project.<init>(Project.java:91)
>       at 
> org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject.<init>(HiveProject.java:70)
>       at 
> org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject.create(HiveProject.java:103)
>       at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.introduceDerivedTable(PlanModifierForASTConv.java:211)
>       at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.convertOpTree(PlanModifierForASTConv.java:67)
>       at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:94)
>       at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:617)
>       at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:248)
>       at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10108)
>       at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:208)
>       at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
>       at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
>       at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to