Re: Question about input file breakdown

Ming Yang Mon, 15 Oct 2007 14:23:24 -0700

thank you! after tracing the code I realized that I should override
getRecordReader(...) as well to return the whole content of the file,
ie. to finish the job. :)


2007/10/15, Ted Dunning <[EMAIL PROTECTED]>:
>
>
> You didn't do anything wrong.  You just didn't finish the job.
>
> You need to override getRecordReader as well so that it returns the contents
> of the file (or a lazy version of same) as a single record.
>
>
> On 10/15/07 11:00 AM, "Ming Yang" <[EMAIL PROTECTED]> wrote:
>
> > I just did a test by simply extending from TextInputFormat
> > and override isSplitable(FileSystem fs, Path file) to always
> > returning false. However, in my mapper, I still see the input
> > file gets splitted into lines. I did set the input format in
> > JobConfiguration and isSplitable(...) -> false did get called
> > during job execution. Is there anything I did wrong or
> > this is the behavior I should be expecting?
> >
> > Thanks,
> >
> > Ming
> >
> > 2007/10/15, Ted Dunning <[EMAIL PROTECTED]>:
> >>
> >> That doesn't quite do what the poster requested.  They wanted to pass the
> >> entire file to the mapper.
> >>
> >> That requires a custom input format or an indirect input approach (list of
> >> file names in input).
> >>
> >>
> >> On 10/15/07 9:57 AM, "Rick Cox" <[EMAIL PROTECTED]> wrote:
> >>
> >>> You can also gzip each input file. Hadoop will not split a compressed
> >>> input file (but will automatically decompress it before feeding it to
> >>> your mapper).
> >>>
> >>> rick
> >>>
> >>> On 10/15/07, Ted Dunning <[EMAIL PROTECTED]> wrote:
> >>>>
> >>>>
> >>>> Use a list of file names as your map input.  Then your mapper can read a
> >>>> line, use that to open and read a file for processing.
> >>>>
> >>>> This is similar to the problem of web-crawling where the input is a list 
> >>>> of
> >>>> URL's.
> >>>>
> >>>> On 10/15/07 6:57 AM, "Ming Yang" <[EMAIL PROTECTED]> wrote:
> >>>>
> >>>>> I was writing a test mapreduce program and noticed that the
> >>>>> input file was always broken down into separate lines and fed
> >>>>> to the mapper. However, in my case I need to process the whole
> >>>>> file in the mapper since there are some dependency between
> >>>>> lines in the input file. Is there any way I can achieve this --
> >>>>> process the whole input file, either text or binary, in the mapper?
> >>>>
> >>>>
> >>
> >>
>
>

Re: Question about input file breakdown

Reply via email to