Hi Bertrand, I believe he is talking about MapFile's index files, explained here: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/MapFile.html
On Fri, Jul 27, 2012 at 11:24 AM, Bertrand Dechoux <decho...@gmail.com> wrote: > Your use of 'index' is indeed not clear. Are you talking about Hive or > HBase? > > I can confirm that you will have one result file per reducer. Of course, > for efficiency reasons, you need to limit the number of files. But if you > are using multiple reducers it should mean that one reducer isn't fast > enough, so it could be assumed that the output for each reducer is big > enough. If that not the case, you can limit the number of reducer to one. > > In general, the 'fragmentation' of the results is dealt by the next job. > You should provide more information about your real problem and its context. > > Bertrand > > On Fri, Jul 27, 2012 at 3:15 AM, syed kather <in.ab...@gmail.com> wrote: > >> Mike , >> Can you please give more details . Context is not clear . Can you share ur >> use case if possible >> On Jul 24, 2012 1:40 AM, "Mike S" <mikesam...@gmail.com> wrote: >> >> > If I set my reducer output to map file output format and the job would >> > say have 100 reducers, will the output generate 100 different index >> > file (one for each reducer) or one index file for all the reducers >> > (basically one index file per job)? >> > >> > If it is one index file per reducer, can rely on HDFS append to change >> > the index write behavior and build one index file from all the >> > reducers by basically making all the parallel reducers to append to >> > one index file? Data files do not matter. >> > >> > > > > -- > Bertrand Dechoux -- Harsh J