[ 
https://issues.apache.org/jira/browse/HIVE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794522#comment-13794522
 ] 

Ankit Malhotra commented on HIVE-4175:
--------------------------------------

Ran into this while using Elephant Bird's protobuf deserializer. This only 
happens when I have partitions that dont have any files.

My table:
{code}
CREATE  TABLE test_proto_v002(
  timestamp bigint COMMENT 'from deserializer', 
  auction_id bigint COMMENT 'from deserializer', 
  object_type int COMMENT 'from deserializer', 
  object_id int COMMENT 'from deserializer', 
  method int COMMENT 'from deserializer', 
  value double COMMENT 'from deserializer', 
  event_type int COMMENT 'from deserializer')
PARTITIONED BY ( 
  dy string, 
  dm string, 
  dd string, 
  dh string)
ROW FORMAT DELIMITED 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.SequenceFileInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION
  'hdfs://localhost/logs/test_proto/v002'
TBLPROPERTIES (
  'transient_lastDdlTime'='1381346965')

Logs for hive job from JT:
{code}
....
2013-10-14 16:24:23,555 INFO org.apache.hadoop.mapred.MapTask: Processing 
split: 
Paths:/Users/amalhotra/hadoop/appdoop/tmp/hive/hive_2013-10-14_16-24-18_372_8603315140627624341/-mr-10002/1/emptyFile:0+87,/logs/test_proto/v002/2013/10/14/11/test_proto_1381765303443:0+541InputFormatClass:
 org.apache.hadoop.mapred.SequenceFileInputFormat
....
2013-10-14 16:24:24,079 INFO 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file 
hdfs://localhost/logs/test_proto/v002/2013/10/14/11/test_proto_1381765303443
2013-10-14 16:24:24,079 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 
Processing alias test_proto_v002 for file 
hdfs://localhost/logs/test_proto/v002/2013/10/14/11
2013-10-14 16:24:24,080 FATAL ExecMapper: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing writable 
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539)
        at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1407)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be 
cast to org.apache.hadoop.io.BytesWritable
        at 
com.twitter.elephantbird.hive.serde.ProtobufDeserializer.deserialize(ProtobufDeserializer.java:56)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:525)
        ... 9 more

2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 
finished. closing... 
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 
forwarded 0 rows
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 
DESERIALIZE_ERRORS:1
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 
3 finished. closing... 
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 
3 forwarded 0 rows
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 2 
finished. closing... 
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 2 
forwarded 0 rows
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator: 1 
finished. closing... 
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator: 1 
forwarded 0 rows
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 
0 finished. closing... 
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 
0 forwarded 0 rows
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator: 1 
Close done
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 2 
Close done
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 
3 Close done
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 
Close done
2013-10-14 16:24:24,080 INFO ExecMapper: ExecMapper: processed 0 rows: used 
memory = 120821440
2013-10-14 16:24:24,084 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-10-14 16:24:24,086 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing writable 
        at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1407)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing writable 
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539)
        at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
        ... 8 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be 
cast to org.apache.hadoop.io.BytesWritable
        at 
com.twitter.elephantbird.hive.serde.ProtobufDeserializer.deserialize(ProtobufDeserializer.java:56)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:525)
        ... 9 more
2013-10-14 16:24:24,089 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
for the task
...
{code}


> Injection of emptyFile into input splits for empty partitions causes 
> Deserializer to fail
> -----------------------------------------------------------------------------------------
>
>                 Key: HIVE-4175
>                 URL: https://issues.apache.org/jira/browse/HIVE-4175
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.10.0
>         Environment: CDH4.2, using MR1
>            Reporter: James Kebinger
>            Priority: Minor
>
> My deserializer is expecting to receive one of 2 different subclasses of 
> Writable, but in certain circumstances it receives an empty instance of 
> org.apache.hadoop.io.Text. This only happens for task attempts where I 
> observe the file called "emptyFile" in the list of input splits. 
> I'm doing queries over an external year/month/day partitioned table that have 
> eagerly created partitions for, so as of today for example, I may do a query 
> where year = 2013 and month = 3 which includes empty partitions.
> In the course of investigation I downloaded the sequence files to confirm 
> they were ok. Once I realized that processing of empty partitions was to 
> blame, I am able to work around the issue by bounding my queries to populated 
> partitions.
> Can the need for the emptyFile be eliminated in the case where there's 
> already a bunch of splits being processed? Failing that, can the mapper 
> detect the current input is from emptyFile and not call the deserializer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to