[ 
https://issues.apache.org/jira/browse/HIVE-18144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang Haihua updated HIVE-18144:
-------------------------------
    Description: 
Union operation with three or more table, which has different column types, may 
cause type inference error when Task execution.

E.g, e.g. t1(with column int) union all t2(with column int) union all t3(with 
column bigint), finally should be {{bigint}},

RowSchema of union t1 with t2, we call {{leftOp}}, should be int, then leftOp 
union t3 should finally be bigint.

This mean RowSchema of leftOp would be {{bigint}} instead of {{int}}

However we see in SemanticAnalyzer.java:

leftOp RowSchema is finally {{int}} which was wrong: 
{code}
(_col0: int|{t01-subquery1}diff_long_type,_col1: int|{t01-subquery1}id2,_col2: 
bigint|{t01-subquery1}id3)}}
{code}

Impacted code  in SemanticAnalyzer.java:

{code}

      if(!(leftOp instanceof UnionOperator)) {
        Operator oldChild = leftOp;
        leftOp = (Operator) leftOp.getParentOperators().get(0);
        leftOp.removeChildAndAdoptItsChildren(oldChild);
      }

      // make left a child of right
      List<Operator<? extends OperatorDesc>> child =
          new ArrayList<Operator<? extends OperatorDesc>>();
      child.add(leftOp);
      rightOp.setChildOperators(child);

      List<Operator<? extends OperatorDesc>> parent = leftOp
          .getParentOperators();
      parent.add(rightOp);

      UnionDesc uDesc = ((UnionOperator) leftOp).getConf();
      // Here we should set RowSchema of leftOp to unionoutRR's, or else the 
RowSchema of leftOp is wrong.
      // leftOp.setSchema(new RowSchema(unionoutRR.getColumnInfos()));
      uDesc.setNumInputs(uDesc.getNumInputs() + 1);
      return putOpInsertMap(leftOp, unionoutRR);

{code}

Operation for reproduce:

{code}
create table test_union_different_type(id bigint, id2 bigint, id3 bigint, name 
string);
set hive.auto.convert.join=true;
insert overwrite table test_union_different_type select 1, 2, 3, 
"test_union_different_type";
select
  t01.diff_long_type as diff_long_type,
  t01.id2 as id2,
  t00.id as id,
  t01.id3 as id3
from test_union_different_type t00
left join
  (
    select 1 as diff_long_type, 30 as id2, id3 from test_union_different_type
    union ALL
    select 2 as diff_long_type, 20 as id2, id3 from test_union_different_type
    union ALL
    select id as diff_long_type, id2, 30 as id3 from test_union_different_type
  ) t01
on t00.id = t01.diff_long_type
;

{code}

Stack trace:

{code}

Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {"id":1,"id2":null,"id3":null,"name":null}
  at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row {"id":1,"id2":null,"id3":null,"name":null}
  at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:499)
  at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
  ... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected 
exception from MapJoinOperator : org.apache.hadoop.io.LongWritable cannot be 
cast to org.apache.hadoop.io.IntWritable
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:465)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
  at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
  at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:149)
  at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:489)
  ... 9 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable 
cannot be cast to org.apache.hadoop.io.IntWritable
  at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:36)
  at 
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:239)
  at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292)
  at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247)
  at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231)
  at 
org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
  at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:714)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
  at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
  at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:647)
  at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:660)
  at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:663)
  at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:759)
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:452)
  ... 13 more

{code}

  was:
Union operation with three or more table, which has different column types, may 
cause type inference error when Task execution.

E.g, e.g. t1(with column int) union all t2(with column int) union all t3(with 
column bigint), finally should be {{bigint}},

RowSchema of union t1 with t2, we call {{leftOp}}, should be int, then leftOp 
union t3 should finally be bigint.

This mean RowSchema of leftOp would be {{bigint}} instead of {{int}}

However we see in SemanticAnalyzer.java:

leftOp RowSchema is finally {{int}} which was wrong: {{(_col0: 
int|{t01-subquery1}diff_long_type,_col1: int|{t01-subquery1}id2,_col2: 
bigint|{t01-subquery1}id3)}}

Impacted code  in SemanticAnalyzer.java:

{code}

      if(!(leftOp instanceof UnionOperator)) {
        Operator oldChild = leftOp;
        leftOp = (Operator) leftOp.getParentOperators().get(0);
        leftOp.removeChildAndAdoptItsChildren(oldChild);
      }

      // make left a child of right
      List<Operator<? extends OperatorDesc>> child =
          new ArrayList<Operator<? extends OperatorDesc>>();
      child.add(leftOp);
      rightOp.setChildOperators(child);

      List<Operator<? extends OperatorDesc>> parent = leftOp
          .getParentOperators();
      parent.add(rightOp);

      UnionDesc uDesc = ((UnionOperator) leftOp).getConf();
      // Here we should set RowSchema of leftOp to unionoutRR's, or else the 
RowSchema of leftOp is wrong.
      // leftOp.setSchema(new RowSchema(unionoutRR.getColumnInfos()));
      uDesc.setNumInputs(uDesc.getNumInputs() + 1);
      return putOpInsertMap(leftOp, unionoutRR);

{code}

Operation for reproduce:

{code}
create table test_union_different_type(id bigint, id2 bigint, id3 bigint, name 
string);
set hive.auto.convert.join=true;
insert overwrite table test_union_different_type select 1, 2, 3, 
"test_union_different_type";
select
  t01.diff_long_type as diff_long_type,
  t01.id2 as id2,
  t00.id as id,
  t01.id3 as id3
from test_union_different_type t00
left join
  (
    select 1 as diff_long_type, 30 as id2, id3 from test_union_different_type
    union ALL
    select 2 as diff_long_type, 20 as id2, id3 from test_union_different_type
    union ALL
    select id as diff_long_type, id2, 30 as id3 from test_union_different_type
  ) t01
on t00.id = t01.diff_long_type
;

{code}

Stack trace:

{code}

Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {"id":1,"id2":null,"id3":null,"name":null}
  at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row {"id":1,"id2":null,"id3":null,"name":null}
  at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:499)
  at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
  ... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected 
exception from MapJoinOperator : org.apache.hadoop.io.LongWritable cannot be 
cast to org.apache.hadoop.io.IntWritable
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:465)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
  at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
  at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:149)
  at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:489)
  ... 9 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable 
cannot be cast to org.apache.hadoop.io.IntWritable
  at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:36)
  at 
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:239)
  at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292)
  at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247)
  at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231)
  at 
org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
  at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:714)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
  at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
  at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:647)
  at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:660)
  at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:663)
  at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:759)
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:452)
  ... 13 more

{code}


> Runtime type inference error when join three table for different column type 
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-18144
>                 URL: https://issues.apache.org/jira/browse/HIVE-18144
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>    Affects Versions: 2.1.1, 2.2.0
>            Reporter: Wang Haihua
>            Assignee: Wang Haihua
>
> Union operation with three or more table, which has different column types, 
> may cause type inference error when Task execution.
> E.g, e.g. t1(with column int) union all t2(with column int) union all t3(with 
> column bigint), finally should be {{bigint}},
> RowSchema of union t1 with t2, we call {{leftOp}}, should be int, then leftOp 
> union t3 should finally be bigint.
> This mean RowSchema of leftOp would be {{bigint}} instead of {{int}}
> However we see in SemanticAnalyzer.java:
> leftOp RowSchema is finally {{int}} which was wrong: 
> {code}
> (_col0: int|{t01-subquery1}diff_long_type,_col1: 
> int|{t01-subquery1}id2,_col2: bigint|{t01-subquery1}id3)}}
> {code}
> Impacted code  in SemanticAnalyzer.java:
> {code}
>       if(!(leftOp instanceof UnionOperator)) {
>         Operator oldChild = leftOp;
>         leftOp = (Operator) leftOp.getParentOperators().get(0);
>         leftOp.removeChildAndAdoptItsChildren(oldChild);
>       }
>       // make left a child of right
>       List<Operator<? extends OperatorDesc>> child =
>           new ArrayList<Operator<? extends OperatorDesc>>();
>       child.add(leftOp);
>       rightOp.setChildOperators(child);
>       List<Operator<? extends OperatorDesc>> parent = leftOp
>           .getParentOperators();
>       parent.add(rightOp);
>       UnionDesc uDesc = ((UnionOperator) leftOp).getConf();
>       // Here we should set RowSchema of leftOp to unionoutRR's, or else the 
> RowSchema of leftOp is wrong.
>       // leftOp.setSchema(new RowSchema(unionoutRR.getColumnInfos()));
>       uDesc.setNumInputs(uDesc.getNumInputs() + 1);
>       return putOpInsertMap(leftOp, unionoutRR);
> {code}
> Operation for reproduce:
> {code}
> create table test_union_different_type(id bigint, id2 bigint, id3 bigint, 
> name string);
> set hive.auto.convert.join=true;
> insert overwrite table test_union_different_type select 1, 2, 3, 
> "test_union_different_type";
> select
>   t01.diff_long_type as diff_long_type,
>   t01.id2 as id2,
>   t00.id as id,
>   t01.id3 as id3
> from test_union_different_type t00
> left join
>   (
>     select 1 as diff_long_type, 30 as id2, id3 from test_union_different_type
>     union ALL
>     select 2 as diff_long_type, 20 as id2, id3 from test_union_different_type
>     union ALL
>     select id as diff_long_type, id2, 30 as id3 from test_union_different_type
>   ) t01
> on t00.id = t01.diff_long_type
> ;
> {code}
> Stack trace:
> {code}
> Diagnostic Messages for this Task:
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row {"id":1,"id2":null,"id3":null,"name":null}
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row {"id":1,"id2":null,"id3":null,"name":null}
>   at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:499)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
>   ... 8 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected 
> exception from MapJoinOperator : org.apache.hadoop.io.LongWritable cannot be 
> cast to org.apache.hadoop.io.IntWritable
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:465)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:149)
>   at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:489)
>   ... 9 more
> Caused by: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable 
> cannot be cast to org.apache.hadoop.io.IntWritable
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:36)
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:239)
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292)
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247)
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231)
>   at 
> org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:714)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:647)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:660)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:663)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:759)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:452)
>   ... 13 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to