[ https://issues.apache.org/jira/browse/HIVE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917697#comment-13917697 ]
Mithun Radhakrishnan commented on HIVE-6389: -------------------------------------------- Hey, Ashutosh. As a matter of fact, this stack trace is the result of running "select mymap[ 'xyz' ] from mytable", if mytable has null values for mymap. Although the bug is in the LazyBinaryObjectInspector for Maps, it doesn't manifest at the time of read. The reason you're seeing a fail in LazySimpleSerde is because the results of the query are being serialized into String (i.e. to console). The LazyBinaryMapOI returns -1 for NULL maps. WHen the LazySimpleSerde attempts to convert this Integer into Text, we get this bad-cast exception. The OI should have been returning nulls for null objects, like the ColumnarSerDe does. The way I tested this is: 1. create table mytable_text( mymap map<string, string> ) stored as textfile; 2. echo "\N\n\N\n\N" > /tmp/mytable.txt && hdfs dfs -copyFromLocal /tmp/mytable.txt /user/hive/warehouse/mytable_text 3. create table mytable_rcfile( mymap map<string, string> ) stored as rcfile; -- LazyBinaryColumnarSerDe 4. insert overwrite table mytable_rcfile select mymap from mytable_text; 5. select mymap['blah'] from mytable_rcfile; Steps 1-4 is simply to insert a null-map into an RCFile-based table. Step 5 causes the null-map to be returned by LazyBinaryMapOI as '-1', etc. This patch brings LazyBinaryMapOI's behaviour in line with LazyMapOI. (This is likely just a copy-paste error, from getMapSize(). > LazyBinaryColumnarSerDe-based RCFile tables break when looking up elements in > null-maps. > ---------------------------------------------------------------------------------------- > > Key: HIVE-6389 > URL: https://issues.apache.org/jira/browse/HIVE-6389 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers > Affects Versions: 0.10.0, 0.11.0, 0.12.0, 0.13.0 > Reporter: Mithun Radhakrishnan > Assignee: Mithun Radhakrishnan > Attachments: Hive-6389.patch > > > RCFile tables that use the LazyBinaryColumnarSerDe don't seem to handle > look-ups into map-columns when the value of the column is null. > When an RCFile table is created with LazyBinaryColumnarSerDe (as is default > in 0.12), and queried as follows: > {code} > select mymap['1024'] from mytable; > {code} > and if the mymap column has nulls, then one is treated to the following > guttural utterance: > {code} > 2014-02-05 21:50:25,050 FATAL mr.ExecMapper (ExecMapper.java:map(194)) - > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row {"id":null,"mymap":null,"isnull":null} > at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) > at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to > org.apache.hadoop.io.Text > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(WritableStringObjectInspector.java:41) > at > org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:226) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:486) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:439) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:423) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:560) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) > ... 10 more > {code} > A patch is on the way, but the short of it is that the LazyBinaryMapOI needs > to return nulls if either the map or the lookup-key is null. > This is handled correctly for Text data, and for RCFiles using ColumnarSerDe. -- This message was sent by Atlassian JIRA (v6.2#6252)