I'm currently running a hive build from trunk, revision number 911889. I've
built a UDTF called map_explode which just emits the key and value of each
entry in a map as a row in the result table. The table I'm running it against
looks like:
hive> describe mytable;
product string from deserializer
...
interactions map<string,int> from deserializer
If I use the map_explode in the select clause, I get the expected results:
hive> select map_explode(interactions) as (key, value) from mytable where day =
'2010-02-18' and hour = 1 limit 10;
...
OK
invite_impression 1
invite_impression 1
invite_impression 1
invite_impression 1
rollout 12
invite_impression 1
invite_impression 1
invite_impression 1
rollout 4
invite_impression 1
Time taken: 22.11 seconds
However, if I try to use LATERAL JOIN to relate the exploded values back to the
parent table, like so:
hive> select product, key, sum(value) from mytable LATERAL VIEW
map_explode(interactions) interacts as key, value where day = '2010-02-18' and
hour = 1 group by product, key;
I get the following error:
FAILED: Unknown exception: null
Looking in hive.log, I see the follow stack trace:
2010-02-19 14:15:17,215 ERROR ql.Driver (SessionState.java:printError(255)) -
FAILED: Unknown exception: null
java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.ppd.ExprWalkerProcFactory$ColumnExprProcessor.process(ExprWalkerProcFactory.java:87)
at
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:129)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:103)
at
org.apache.hadoop.hive.ql.ppd.ExprWalkerProcFactory.extractPushdownPreds(ExprWalkerProcFactory.java:273)
at
org.apache.hadoop.hive.ql.ppd.OpProcFactory$DefaultPPD.mergeWithChildrenPred(OpProcFactory.java:317)
at
org.apache.hadoop.hive.ql.ppd.OpProcFactory$DefaultPPD.process(OpProcFactory.java:258)
at
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:129)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:103)
at
org.apache.hadoop.hive.ql.ppd.PredicatePushDown.transform(PredicatePushDown.java:103)
at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:74)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5758)
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:125)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:304)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:377)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
I peeked at ExprWalkerProcFactory, but couldn't readily see what was causing
the problem. Any ideas?
Jason