just fyi, here are the reasons we tend to encounter:

- regex'd metadata from a filename (for example, date, data source, etc...
anything that exists on a per-file basis and not embedded in every record)
- error message ("illegal widget name", "file XXX"), usually in an output
record which is later reduced
- etc...

each of these are handled just fine by having the filename in the jobconf.

im sure there are other purposes, but these are what we run into.


On 8/9/06, Owen O'Malley <[EMAIL PROTECTED]> wrote:


On Aug 9, 2006, at 12:21 PM, Eric Baldeschwieler wrote:

> Why not provide a pointer to the real record reader?  Seems like a
> valid OO way to get access to all kinds of things.

Those attributes were put in to the JobConf so that Hadoop could re-
run an isolated task, so they had to be serializable. Putting real
objects into the JobConf breaks that property.

Ben hasn't explained why he wants the RecordReader, so I was trying
to guess. The problem with giving out references to the RecordReader
is that you are exposing the framework's implementation details. In
particular, all you can really do to a record reader is advance it.
That really isn't something that the Mapper should be doing.

-- Owen

Reply via email to