[
https://issues.apache.org/jira/browse/HIVE-18390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16314537#comment-16314537
]
Hengyu Dai commented on HIVE-18390:
-----------------------------------
Since latest Hive generate a different operator tree from the SQL I pasted, the
simple query will not lead an exception in latest hive, but it's still
dangerous as it's not fixed by now in my opinion.
Here is the exception stack on hive 2.1.1
{code:java}
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at
org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcFactory$ColumnPrunerSelectProc.process(ColumnPrunerProcFactory.java:792)
at
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at
org.apache.hadoop.hive.ql.optimizer.ColumnPruner$ColumnPrunerWalker.walk(ColumnPruner.java:176)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
at
org.apache.hadoop.hive.ql.optimizer.ColumnPruner.transform(ColumnPruner.java:136)
at
org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:242)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10973)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10550)
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:483)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1254)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1396)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1181)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1170)
at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:229)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:180)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:396)
at
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:770)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:711)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:638)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
{code}
> IndexOutOfBoundsException when query a partitioned view in ColumnPruner
> -------------------------------------------------------------------------
>
> Key: HIVE-18390
> URL: https://issues.apache.org/jira/browse/HIVE-18390
> Project: Hive
> Issue Type: Bug
> Components: Query Planning, Views
> Affects Versions: 2.1.1
> Reporter: Hengyu Dai
> Attachments: HIVE-18390.patch
>
>
> IndexOutOfBoundsException is encountered when query a partitioned view.
> in Column Prunning, each SEL operator collects the accessed column in current
> SEL operator,
> When ColumnPrunerSelectProc getting a view's columns accessed, it will first
> get the index of output column names in the view, then call
> Table.getCols().get(index).getName() to finally get the
> name of output column, but Table.getCols() will not return all columns
> (partitioned column is
> lacked), so if partitioned columns is queried, an IndexOutOfBoundsException
> will throw.
> REPRODUCE:
> {code:sql}
> create table foo
> (
> `a` string
> ) partitioned by (`b` string)
> ;
> create view bar partitioned on (b) as
> select a,b from foo;
> select * from bar; --IndexOutOfBoundsException
> {code}
> OPERATORE TREE:
> {code:java}
> TS[0]
> |
> SEL[1]
> |
> SEL[2]
> |
> FS[3]
> {code}
> SEL[1] collects accessed column(contains partitioned column b), b's internal
> column name is '_col1', the corresponding column index is 1, but actually
> bar's getCols() returned a list of length 1: ['a'], so tab.getCols().get(1)
> throw tab.getCols().get(index)
> HOW TO FIX:
> instead of call view's getCols() method, we should get all columns including
> partitioned columns
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)