Much obliged, Harsh. looks perfect. 2011/8/9 Harsh J <ha...@cloudera.com>
> Jonathan, > > 1. is correct with the compound key method, since you need document-ID > and then work upon it. If you don't want it grouped/sorted by > document, consider adding it as a value attribute instead, of course. > > 2. The record reader is the right place. The FileSplit object's path > attribute specifically. I've detailed how to extract information from > Mappers before (both old and new APIs of MR): > http://search-hadoop.com/m/9Nqjm1aqu8a1 has the pointers. > > On Wed, Aug 10, 2011 at 2:41 AM, Jonathan Coveney <jcove...@gmail.com> > wrote: > > I want to calculate some statistics on a per document basis, and it seems > > like the only way to do this would be to emit a compound key of > > (key,documentname). > > 1) Is this the case, or is there a better way to do this? > > 2) If this is the only way to calculate a per input file basis, where is > the > > right place to grab this? A custom line reader? What object is exposed to > > this? > > > > -- > Harsh J >