[ 
https://issues.apache.org/jira/browse/HIVE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794727#comment-13794727
 ] 

Ankit Malhotra commented on HIVE-4175:
--------------------------------------

[~ashutoshc]
Assuming 11 has data and 12 does not:

{code}
select count(*), object_type from test_proto where dh IN ('2013-10-14 11', 
'2013-10-14 12') group by object_type;
{code}

[ProtobufDeserializer.java|https://github.com/kevinweil/elephant-bird/blob/master/hive/src/main/java/com/twitter/elephantbird/hive/serde/ProtobufDeserializer.java?source=c#L56]
 throws a ClassCastException as seen above. On further digging, the blob of 
type Text is empty and blob.getLength() returned 0.

> Injection of emptyFile into input splits for empty partitions causes 
> Deserializer to fail
> -----------------------------------------------------------------------------------------
>
>                 Key: HIVE-4175
>                 URL: https://issues.apache.org/jira/browse/HIVE-4175
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.10.0
>         Environment: CDH4.2, using MR1
>            Reporter: James Kebinger
>            Priority: Minor
>
> My deserializer is expecting to receive one of 2 different subclasses of 
> Writable, but in certain circumstances it receives an empty instance of 
> org.apache.hadoop.io.Text. This only happens for task attempts where I 
> observe the file called "emptyFile" in the list of input splits. 
> I'm doing queries over an external year/month/day partitioned table that have 
> eagerly created partitions for, so as of today for example, I may do a query 
> where year = 2013 and month = 3 which includes empty partitions.
> In the course of investigation I downloaded the sequence files to confirm 
> they were ok. Once I realized that processing of empty partitions was to 
> blame, I am able to work around the issue by bounding my queries to populated 
> partitions.
> Can the need for the emptyFile be eliminated in the case where there's 
> already a bunch of splits being processed? Failing that, can the mapper 
> detect the current input is from emptyFile and not call the deserializer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to