[ https://issues.apache.org/jira/browse/HIVE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794522#comment-13794522 ]
Ankit Malhotra commented on HIVE-4175: -------------------------------------- Ran into this while using Elephant Bird's protobuf deserializer. This only happens when I have partitions that dont have any files. My table: {code} CREATE TABLE test_proto_v002( timestamp bigint COMMENT 'from deserializer', auction_id bigint COMMENT 'from deserializer', object_type int COMMENT 'from deserializer', object_id int COMMENT 'from deserializer', method int COMMENT 'from deserializer', value double COMMENT 'from deserializer', event_type int COMMENT 'from deserializer') PARTITIONED BY ( dy string, dm string, dd string, dh string) ROW FORMAT DELIMITED STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat' LOCATION 'hdfs://localhost/logs/test_proto/v002' TBLPROPERTIES ( 'transient_lastDdlTime'='1381346965') Logs for hive job from JT: {code} .... 2013-10-14 16:24:23,555 INFO org.apache.hadoop.mapred.MapTask: Processing split: Paths:/Users/amalhotra/hadoop/appdoop/tmp/hive/hive_2013-10-14_16-24-18_372_8603315140627624341/-mr-10002/1/emptyFile:0+87,/logs/test_proto/v002/2013/10/14/11/test_proto_1381765303443:0+541InputFormatClass: org.apache.hadoop.mapred.SequenceFileInputFormat .... 2013-10-14 16:24:24,079 INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://localhost/logs/test_proto/v002/2013/10/14/11/test_proto_1381765303443 2013-10-14 16:24:24,079 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias test_proto_v002 for file hdfs://localhost/logs/test_proto/v002/2013/10/14/11 2013-10-14 16:24:24,080 FATAL ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1407) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BytesWritable at com.twitter.elephantbird.hive.serde.ProtobufDeserializer.deserialize(ProtobufDeserializer.java:56) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:525) ... 9 more 2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 finished. closing... 2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 forwarded 0 rows 2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:1 2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 3 finished. closing... 2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 3 forwarded 0 rows 2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 2 finished. closing... 2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 2 forwarded 0 rows 2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator: 1 finished. closing... 2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator: 1 forwarded 0 rows 2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 0 finished. closing... 2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 0 forwarded 0 rows 2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator: 1 Close done 2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 2 Close done 2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 3 Close done 2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 Close done 2013-10-14 16:24:24,080 INFO ExecMapper: ExecMapper: processed 0 rows: used memory = 120821440 2013-10-14 16:24:24,084 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 2013-10-14 16:24:24,086 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1407) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) ... 8 more Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BytesWritable at com.twitter.elephantbird.hive.serde.ProtobufDeserializer.deserialize(ProtobufDeserializer.java:56) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:525) ... 9 more 2013-10-14 16:24:24,089 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task ... {code} > Injection of emptyFile into input splits for empty partitions causes > Deserializer to fail > ----------------------------------------------------------------------------------------- > > Key: HIVE-4175 > URL: https://issues.apache.org/jira/browse/HIVE-4175 > Project: Hive > Issue Type: Bug > Affects Versions: 0.10.0 > Environment: CDH4.2, using MR1 > Reporter: James Kebinger > Priority: Minor > > My deserializer is expecting to receive one of 2 different subclasses of > Writable, but in certain circumstances it receives an empty instance of > org.apache.hadoop.io.Text. This only happens for task attempts where I > observe the file called "emptyFile" in the list of input splits. > I'm doing queries over an external year/month/day partitioned table that have > eagerly created partitions for, so as of today for example, I may do a query > where year = 2013 and month = 3 which includes empty partitions. > In the course of investigation I downloaded the sequence files to confirm > they were ok. Once I realized that processing of empty partitions was to > blame, I am able to work around the issue by bounding my queries to populated > partitions. > Can the need for the emptyFile be eliminated in the case where there's > already a bunch of splits being processed? Failing that, can the mapper > detect the current input is from emptyFile and not call the deserializer. -- This message was sent by Atlassian JIRA (v6.1#6144)