I just did a test by simply extending from TextInputFormat and override isSplitable(FileSystem fs, Path file) to always returning false. However, in my mapper, I still see the input file gets splitted into lines. I did set the input format in JobConfiguration and isSplitable(...) -> false did get called during job execution. Is there anything I did wrong or this is the behavior I should be expecting?
Thanks, Ming 2007/10/15, Ted Dunning <[EMAIL PROTECTED]>: > > That doesn't quite do what the poster requested. They wanted to pass the > entire file to the mapper. > > That requires a custom input format or an indirect input approach (list of > file names in input). > > > On 10/15/07 9:57 AM, "Rick Cox" <[EMAIL PROTECTED]> wrote: > > > You can also gzip each input file. Hadoop will not split a compressed > > input file (but will automatically decompress it before feeding it to > > your mapper). > > > > rick > > > > On 10/15/07, Ted Dunning <[EMAIL PROTECTED]> wrote: > >> > >> > >> Use a list of file names as your map input. Then your mapper can read a > >> line, use that to open and read a file for processing. > >> > >> This is similar to the problem of web-crawling where the input is a list of > >> URL's. > >> > >> On 10/15/07 6:57 AM, "Ming Yang" <[EMAIL PROTECTED]> wrote: > >> > >>> I was writing a test mapreduce program and noticed that the > >>> input file was always broken down into separate lines and fed > >>> to the mapper. However, in my case I need to process the whole > >>> file in the mapper since there are some dependency between > >>> lines in the input file. Is there any way I can achieve this -- > >>> process the whole input file, either text or binary, in the mapper? > >> > >> > >
