You can also gzip each input file. Hadoop will not split a compressed input file (but will automatically decompress it before feeding it to your mapper).
rick On 10/15/07, Ted Dunning <[EMAIL PROTECTED]> wrote: > > > Use a list of file names as your map input. Then your mapper can read a > line, use that to open and read a file for processing. > > This is similar to the problem of web-crawling where the input is a list of > URL's. > > On 10/15/07 6:57 AM, "Ming Yang" <[EMAIL PROTECTED]> wrote: > > > I was writing a test mapreduce program and noticed that the > > input file was always broken down into separate lines and fed > > to the mapper. However, in my case I need to process the whole > > file in the mapper since there are some dependency between > > lines in the input file. Is there any way I can achieve this -- > > process the whole input file, either text or binary, in the mapper? > >
