thank you! after tracing the code I realized that I should override getRecordReader(...) as well to return the whole content of the file, ie. to finish the job. :)
2007/10/15, Ted Dunning <[EMAIL PROTECTED]>: > > > You didn't do anything wrong. You just didn't finish the job. > > You need to override getRecordReader as well so that it returns the contents > of the file (or a lazy version of same) as a single record. > > > On 10/15/07 11:00 AM, "Ming Yang" <[EMAIL PROTECTED]> wrote: > > > I just did a test by simply extending from TextInputFormat > > and override isSplitable(FileSystem fs, Path file) to always > > returning false. However, in my mapper, I still see the input > > file gets splitted into lines. I did set the input format in > > JobConfiguration and isSplitable(...) -> false did get called > > during job execution. Is there anything I did wrong or > > this is the behavior I should be expecting? > > > > Thanks, > > > > Ming > > > > 2007/10/15, Ted Dunning <[EMAIL PROTECTED]>: > >> > >> That doesn't quite do what the poster requested. They wanted to pass the > >> entire file to the mapper. > >> > >> That requires a custom input format or an indirect input approach (list of > >> file names in input). > >> > >> > >> On 10/15/07 9:57 AM, "Rick Cox" <[EMAIL PROTECTED]> wrote: > >> > >>> You can also gzip each input file. Hadoop will not split a compressed > >>> input file (but will automatically decompress it before feeding it to > >>> your mapper). > >>> > >>> rick > >>> > >>> On 10/15/07, Ted Dunning <[EMAIL PROTECTED]> wrote: > >>>> > >>>> > >>>> Use a list of file names as your map input. Then your mapper can read a > >>>> line, use that to open and read a file for processing. > >>>> > >>>> This is similar to the problem of web-crawling where the input is a list > >>>> of > >>>> URL's. > >>>> > >>>> On 10/15/07 6:57 AM, "Ming Yang" <[EMAIL PROTECTED]> wrote: > >>>> > >>>>> I was writing a test mapreduce program and noticed that the > >>>>> input file was always broken down into separate lines and fed > >>>>> to the mapper. However, in my case I need to process the whole > >>>>> file in the mapper since there are some dependency between > >>>>> lines in the input file. Is there any way I can achieve this -- > >>>>> process the whole input file, either text or binary, in the mapper? > >>>> > >>>> > >> > >> > >
