Re: Question about input file breakdown

Ted Dunning Mon, 15 Oct 2007 13:54:00 -0700


You didn't do anything wrong.  You just didn't finish the job.


You need to override getRecordReader as well so that it returns the contents
of the file (or a lazy version of same) as a single record.


On 10/15/07 11:00 AM, "Ming Yang" <[EMAIL PROTECTED]> wrote:

> I just did a test by simply extending from TextInputFormat
> and override isSplitable(FileSystem fs, Path file) to always
> returning false. However, in my mapper, I still see the input
> file gets splitted into lines. I did set the input format in
> JobConfiguration and isSplitable(...) -> false did get called
> during job execution. Is there anything I did wrong or
> this is the behavior I should be expecting?
> 
> Thanks,
> 
> Ming
> 
> 2007/10/15, Ted Dunning <[EMAIL PROTECTED]>:
>> 
>> That doesn't quite do what the poster requested.  They wanted to pass the
>> entire file to the mapper.
>> 
>> That requires a custom input format or an indirect input approach (list of
>> file names in input).
>> 
>> 
>> On 10/15/07 9:57 AM, "Rick Cox" <[EMAIL PROTECTED]> wrote:
>> 
>>> You can also gzip each input file. Hadoop will not split a compressed
>>> input file (but will automatically decompress it before feeding it to
>>> your mapper).
>>> 
>>> rick
>>> 
>>> On 10/15/07, Ted Dunning <[EMAIL PROTECTED]> wrote:
>>>> 
>>>> 
>>>> Use a list of file names as your map input.  Then your mapper can read a
>>>> line, use that to open and read a file for processing.
>>>> 
>>>> This is similar to the problem of web-crawling where the input is a list of
>>>> URL's.
>>>> 
>>>> On 10/15/07 6:57 AM, "Ming Yang" <[EMAIL PROTECTED]> wrote:
>>>> 
>>>>> I was writing a test mapreduce program and noticed that the
>>>>> input file was always broken down into separate lines and fed
>>>>> to the mapper. However, in my case I need to process the whole
>>>>> file in the mapper since there are some dependency between
>>>>> lines in the input file. Is there any way I can achieve this --
>>>>> process the whole input file, either text or binary, in the mapper?
>>>> 
>>>> 
>> 
>>

Re: Question about input file breakdown

Reply via email to