[ 
https://issues.apache.org/jira/browse/HIVE-13767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285000#comment-15285000
 ] 

Jesus Camacho Rodriguez commented on HIVE-13767:
------------------------------------------------

The method will push any equi join conditions that are not column references as 
Projections on top of the children of the join, and will return the new 
condition.

The problem is that the position of the columns is accounted incorrectly, and 
thus we end up inferring incorrect type for the column.

Let me go into detail in the code.

Observe the loop in L197-L211. It extracts which conditions are column 
references, and which ones need to pushed to the inputs of the join. Assume we 
have a condition {{CAST(a)=b AND c=d}}. First conjunct is added to the columns 
that need to be added to the inputs (because of CAST), while second conjunct is 
added to the columns that do not need to be pushed i.e. they can be directly 
referenced.

Then loop in L213-L229 creates the first part of the condition consisting of 
the equality conditions that do not need to be pushed i.e. {{c=d}} in our 
example. But observe that _leftKey_ and _rightKey_, which are used to inferred 
the type, are extracted from _leftJoinKeys_ and _rightJoinKeys_ respectively, 
using index _i_ from origColEqConds... This is not right, as _leftKey_ will 
reference {{CAST(a)}} and _rightKey_ will reference {{b}}. That is to say, the 
condition that do not need to be pushed is at _i=1_.

Thus, we need the keep the positions of _leftJoinKeys_ and _rightJoinKeys_ that 
contain conditions that do not need to be pushed: we keep this information in 
the new _origColEqCondsPos_ bitset.

> Wrong type inferred in Semijoin condition leads to AssertionError
> -----------------------------------------------------------------
>
>                 Key: HIVE-13767
>                 URL: https://issues.apache.org/jira/browse/HIVE-13767
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>    Affects Versions: 2.1.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>         Attachments: HIVE-13767.patch
>
>
> Following query fails to run:
> {noformat}
> SELECT
>     COALESCE(498, LEAD(COALESCE(-973, -684, 515)) OVER (PARTITION BY 
> (t2.int_col_10 + t1.smallint_col_50) ORDER BY (t2.int_col_10 + 
> t1.smallint_col_50), FLOOR(t1.double_col_16) DESC), 524) AS int_col,
>     (t2.int_col_10) + (t1.smallint_col_50) AS int_col_1,
>     FLOOR(t1.double_col_16) AS float_col,
>     COALESCE(SUM(COALESCE(62, -380, -435)) OVER (PARTITION BY (t2.int_col_10 
> + t1.smallint_col_50) ORDER BY (t2.int_col_10 + t1.smallint_col_50) DESC, 
> FLOOR(t1.double_col_16) DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 48 
> FOLLOWING), 704) AS int_col_2
> FROM table_1 t1
> INNER JOIN table_18 t2 ON (((t2.tinyint_col_15) = (t1.bigint_col_7)) AND
>                            ((t2.decimal2709_col_9) = 
> (t1.decimal2016_col_26))) AND
>                            ((t2.tinyint_col_20) = (t1.tinyint_col_3))
> WHERE (t2.smallint_col_19) IN (SELECT
>     COALESCE(-92, -994) AS int_col
>     FROM table_1 tt1
>     INNER JOIN table_18 tt2 ON (tt2.decimal1911_col_16) = 
> (tt1.decimal2612_col_77)
>     WHERE (t1.timestamp_col_9) = (tt2.timestamp_col_18));
> {noformat}
> Following error is seen in the logs:
> {noformat}
> 2016-04-27T04:32:09,605 WARN  [...2a24 HiveServer2-Handler-Pool: Thread-211]: 
> thrift.ThriftCLIService (ThriftCLIService.java:ExecuteStatement(501)) - Error 
> executing statement:
> org.apache.hive.service.cli.HiveSQLException: Error running query: 
> java.lang.AssertionError: mismatched type $8 TIMESTAMP(9)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:178)
>  ~[hive-service-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:216)
>  ~[hive-service-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:327) 
> ~[hive-service-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:458)
>  ~[hive-service-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:435)
>  ~[hive-service-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:272)
>  ~[hive-service-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:492)
>  [hive-service-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1317)
>  [hive-service-rpc-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1302)
>  [hive-service-rpc-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> [hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> [hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
>  [hive-service-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  [hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_77]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_77]
>         at java.lang.Thread.run(Thread.java:745) [?:1.8.0_77]
> Caused by: java.lang.AssertionError: mismatched type $8 TIMESTAMP(9)
>         at 
> org.apache.calcite.rex.RexUtil$FixNullabilityShuttle.visitInputRef(RexUtil.java:2042)
>  ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at 
> org.apache.calcite.rex.RexUtil$FixNullabilityShuttle.visitInputRef(RexUtil.java:2020)
>  ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.rex.RexInputRef.accept(RexInputRef.java:112) 
> ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.rex.RexShuttle.visitList(RexShuttle.java:144) 
> ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.rex.RexShuttle.visitCall(RexShuttle.java:93) 
> ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.rex.RexShuttle.visitCall(RexShuttle.java:36) 
> ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.rex.RexCall.accept(RexCall.java:108) 
> ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.rex.RexShuttle.apply(RexShuttle.java:275) 
> ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.rex.RexShuttle.mutate(RexShuttle.java:234) 
> ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.rex.RexShuttle.apply(RexShuttle.java:252) 
> ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.rex.RexUtil.fixUp(RexUtil.java:1239) 
> ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at 
> org.apache.calcite.rel.rules.FilterJoinRule.perform(FilterJoinRule.java:232) 
> ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFilterJoinRule$HiveFilterJoinMergeRule.onMatch(HiveFilterJoinRule.java:78)
>  ~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:318)
>  ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFilterJoinRule$HiveFilterJoinMergeRule.onMatch(HiveFilterJoinRule.java:78)
>  ~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:318)
>  ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at 
> org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:514) 
> ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:392) 
> ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:285)
>  ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at 
> org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:72)
>  ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:207) 
> ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:194) 
> ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:1293)
>  ~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:1166)
>  ~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:956)
>  ~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:887)
>  ~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:113) 
> ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:969)
>  ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at 
> org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:149) 
> ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at 
> org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106) 
> ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:706)
>  ~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:274)
>  ~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10642)
>  ~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:233)
>  ~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:245)
>  ~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
>  ~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:245)
>  ~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:476) 
> ~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:318) 
> ~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1191) 
> ~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         ... 15 more
> {noformat}
> Hive DDL for supporting tables are:
> {noformat}
> CREATE TABLE table_1 (timestamp_col_1 TIMESTAMP, decimal3003_col_2 
> DECIMAL(30, 3), tinyint_col_3 TINYINT, decimal0101_col_4 DECIMAL(1, 1), 
> boolean_col_5 BOOLEAN, float_col_6 FLOAT, bigint_col_7 BIGINT, 
> varchar0098_col_8 VARCHAR(98), timestamp_col_9 TIMESTAMP, bigint_col_10 
> BIGINT, decimal0903_col_11 DECIMAL(9, 3), timestamp_col_12 TIMESTAMP, 
> timestamp_col_13 TIMESTAMP, float_col_14 FLOAT, char0254_col_15 CHAR(254), 
> double_col_16 DOUBLE, timestamp_col_17 TIMESTAMP, boolean_col_18 BOOLEAN, 
> decimal2608_col_19 DECIMAL(26, 8), varchar0216_col_20 VARCHAR(216), 
> string_col_21 STRING, bigint_col_22 BIGINT, boolean_col_23 BOOLEAN, 
> timestamp_col_24 TIMESTAMP, boolean_col_25 BOOLEAN, decimal2016_col_26 
> DECIMAL(20, 16), string_col_27 STRING, decimal0202_col_28 DECIMAL(2, 2), 
> float_col_29 FLOAT, decimal2020_col_30 DECIMAL(20, 20), boolean_col_31 
> BOOLEAN, double_col_32 DOUBLE, varchar0148_col_33 VARCHAR(148), 
> decimal2121_col_34 DECIMAL(21, 21), tinyint_col_35 TINYINT, boolean_col_36 
> BOOLEAN, boolean_col_37 BOOLEAN, string_col_38 STRING, decimal3420_col_39 
> DECIMAL(34, 20), timestamp_col_40 TIMESTAMP, decimal1408_col_41 DECIMAL(14, 
> 8), string_col_42 STRING, decimal0902_col_43 DECIMAL(9, 2), 
> varchar0204_col_44 VARCHAR(204), boolean_col_45 BOOLEAN, timestamp_col_46 
> TIMESTAMP, boolean_col_47 BOOLEAN, bigint_col_48 BIGINT, boolean_col_49 
> BOOLEAN, smallint_col_50 SMALLINT, decimal0704_col_51 DECIMAL(7, 4), 
> timestamp_col_52 TIMESTAMP, boolean_col_53 BOOLEAN, timestamp_col_54 
> TIMESTAMP, int_col_55 INT, decimal0505_col_56 DECIMAL(5, 5), char0155_col_57 
> CHAR(155), boolean_col_58 BOOLEAN, bigint_col_59 BIGINT, boolean_col_60 
> BOOLEAN, boolean_col_61 BOOLEAN, char0249_col_62 CHAR(249), boolean_col_63 
> BOOLEAN, timestamp_col_64 TIMESTAMP, decimal1309_col_65 DECIMAL(13, 9), 
> int_col_66 INT, float_col_67 FLOAT, timestamp_col_68 TIMESTAMP, 
> timestamp_col_69 TIMESTAMP, boolean_col_70 BOOLEAN, timestamp_col_71 
> TIMESTAMP, double_col_72 DOUBLE, boolean_col_73 BOOLEAN, char0222_col_74 
> CHAR(222), float_col_75 FLOAT, string_col_76 STRING, decimal2612_col_77 
> DECIMAL(26, 12), timestamp_col_78 TIMESTAMP, char0128_col_79 CHAR(128), 
> timestamp_col_80 TIMESTAMP, double_col_81 DOUBLE, timestamp_col_82 TIMESTAMP, 
> float_col_83 FLOAT, decimal2622_col_84 DECIMAL(26, 22), double_col_85 DOUBLE, 
> float_col_86 FLOAT, decimal0907_col_87 DECIMAL(9, 7)) STORED AS orc;
> CREATE TABLE table_18 (boolean_col_1 BOOLEAN, boolean_col_2 BOOLEAN, 
> decimal2518_col_3 DECIMAL(25, 18), float_col_4 FLOAT, timestamp_col_5 
> TIMESTAMP, double_col_6 DOUBLE, double_col_7 DOUBLE, char0035_col_8 CHAR(35), 
> decimal2709_col_9 DECIMAL(27, 9), int_col_10 INT, timestamp_col_11 TIMESTAMP, 
> decimal3604_col_12 DECIMAL(36, 4), string_col_13 STRING, int_col_14 INT, 
> tinyint_col_15 TINYINT, decimal1911_col_16 DECIMAL(19, 11), float_col_17 
> FLOAT, timestamp_col_18 TIMESTAMP, smallint_col_19 SMALLINT, tinyint_col_20 
> TINYINT, timestamp_col_21 TIMESTAMP, boolean_col_22 BOOLEAN, int_col_23 INT) 
> STORED AS orc;
> {noformat}
> The problem is that the reference indices in the condition (and thus, their 
> type) are inferred incorrectly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to