Thanks Harsh! This is very helpful. Regards, Ali
On Mon, Apr 23, 2012 at 2:08 PM, Harsh J <ha...@cloudera.com> wrote: > Ali, > > MapFiles are explained at > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/MapFile.html > - Please give it a read and it should solve half your questions. In > short, MapFile is two files - one raw SequenceFile and another an > index file built on top of it. > > The reason MR does not provide a MapFileInputFormat is that you don't > need to use the index file in MR jobs (no lookups for input-driven > jobs). Hence the SequenceFileInputFormat suffices to read the data (it > ignores the index file, and only reads the sequence ones that carries > the data). > > If you wish to make use of MapFile's index abilities for lookups/etc., > use the MapFile.Reader class directly in your implementation. > > On Mon, Apr 23, 2012 at 4:23 PM, Ali Safdar Kureishy > <safdar.kurei...@gmail.com> wrote: >> Hi, >> >> If I use a *MapFileOutputFormat* to output some data, I see that each >> reducer's output is a folder ("part-00000", for example), and inside that >> folder are two files: "data" and "index". >> >> However, there is no corresponding MapFileInputFormat, to read back this >> folder ("part-00000"). Instead, *SequenceFileInputFormat* seems to read the >> data. So, I have some questions: >> - does SequenceFileInputFormat actually read *all* the data that was output >> by MapFileOutputFormat? Or is some relationship data between the data and >> index files lost in this process that would have been better handled by >> another InputFormat class? In other words, is SequenceFileInputFormat the >> right InputFormat to read data written by MapFileOutputFormat? >> - how is it that SequenceFileInputFormat works to read outputs from >> *both*MapFileOutputFormat and SequenceFileOutputFormat? That would >> imply that >> MapFileOutputFormat and SequenceFileOutputFormat output the same data, OR >> that SequenceFileInputFormat internally handles both differently. What is >> the reality? >> >> Thanks, >> Safdar > > > > -- > Harsh J