Re: Can you see the name of the document being loaded?

Jonathan Coveney Tue, 09 Aug 2011 17:25:57 -0700

Much obliged, Harsh. looks perfect.

2011/8/9 Harsh J <ha...@cloudera.com>


> Jonathan,
>
> 1. is correct with the compound key method, since you need document-ID
> and then work upon it. If you don't want it grouped/sorted by
> document, consider adding it as a value attribute instead, of course.
>
> 2. The record reader is the right place. The FileSplit object's path
> attribute specifically. I've detailed how to extract information from
> Mappers before (both old and new APIs of MR):
> http://search-hadoop.com/m/9Nqjm1aqu8a1 has the pointers.
>
> On Wed, Aug 10, 2011 at 2:41 AM, Jonathan Coveney <jcove...@gmail.com>
> wrote:
> > I want to calculate some statistics on a per document basis, and it seems
> > like the only way to do this would be to emit a compound key of
> > (key,documentname).
> > 1) Is this the case, or is there a better way to do this?
> > 2) If this is the only way to calculate a per input file basis, where is
> the
> > right place to grab this? A custom line reader? What object is exposed to
> > this?
>
>
>
> --
> Harsh J
>

Re: Can you see the name of the document being loaded?

Reply via email to