[
https://issues.apache.org/jira/browse/HIVE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xiang Zhang updated HIVE-4175:
------------------------------
Description:
* My deserializer is expecting to receive one of 2 different subclasses of
Writable, but in certain circumstances it receives an empty instance of
org.apache.hadoop.io.Text. This only happens for task attempts where I observe
the file called "emptyFile" in the list of input splits.
I'm doing queries over an external year/month/day partitioned table that have
eagerly created partitions for, so as of today for example, I may do a query
where year = 2013 and month = 3 which includes empty partitions.
In the course of investigation I downloaded the sequence files to confirm they
were ok. Once I realized that processing of empty partitions was to blame, I am
able to work around the issue by bounding my queries to populated partitions.
Can the need for the emptyFile be eliminated in the case where there's already
a bunch of splits being processed? Failing that, can the mapper detect the
current input is from emptyFile and not call the deserializer.
was:
My deserializer is expecting to receive one of 2 different subclasses of
Writable, but in certain circumstances it receives an empty instance of
org.apache.hadoop.io.Text. This only happens for task attempts where I observe
the file called "emptyFile" in the list of input splits.
I'm doing queries over an external year/month/day partitioned table that have
eagerly created partitions for, so as of today for example, I may do a query
where year = 2013 and month = 3 which includes empty partitions.
In the course of investigation I downloaded the sequence files to confirm they
were ok. Once I realized that processing of empty partitions was to blame, I am
able to work around the issue by bounding my queries to populated partitions.
Can the need for the emptyFile be eliminated in the case where there's already
a bunch of splits being processed? Failing that, can the mapper detect the
current input is from emptyFile and not call the deserializer.
> Injection of emptyFile into input splits for empty partitions causes
> Deserializer to fail
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-4175
> URL: https://issues.apache.org/jira/browse/HIVE-4175
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.10.0
> Environment: CDH4.2, using MR1
> Reporter: James Kebinger
> Priority: Minor
>
> * My deserializer is expecting to receive one of 2 different subclasses of
> Writable, but in certain circumstances it receives an empty instance of
> org.apache.hadoop.io.Text. This only happens for task attempts where I
> observe the file called "emptyFile" in the list of input splits.
> I'm doing queries over an external year/month/day partitioned table that have
> eagerly created partitions for, so as of today for example, I may do a query
> where year = 2013 and month = 3 which includes empty partitions.
> In the course of investigation I downloaded the sequence files to confirm
> they were ok. Once I realized that processing of empty partitions was to
> blame, I am able to work around the issue by bounding my queries to populated
> partitions.
> Can the need for the emptyFile be eliminated in the case where there's
> already a bunch of splits being processed? Failing that, can the mapper
> detect the current input is from emptyFile and not call the deserializer.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)