Advancing the reader sounds like "a bad idea".
But an exotic reader might have all kinds of context it could
publish. maybe current line number, rowID, SQL statement used...
Who knows. There could be lots of stuff.
It would be nice to have an interface that lets you get to any
methods your subclassed reader has decided to publish.
Pushing this through the config doesn't seem right. Having an
available method a mapper can invoke does.
On Aug 9, 2006, at 2:46 PM, Owen O'Malley wrote:
On Aug 9, 2006, at 12:21 PM, Eric Baldeschwieler wrote:
Why not provide a pointer to the real record reader? Seems like a
valid OO way to get access to all kinds of things.
Those attributes were put in to the JobConf so that Hadoop could re-
run an isolated task, so they had to be serializable. Putting real
objects into the JobConf breaks that property.
Ben hasn't explained why he wants the RecordReader, so I was trying
to guess. The problem with giving out references to the
RecordReader is that you are exposing the framework's
implementation details. In particular, all you can really do to a
record reader is advance it. That really isn't something that the
Mapper should be doing.
-- Owen