In hadoop 18 and beyond, the key and value do not have to Implement Writable. As a general rule, the key and value objects passed to the map task will be the same objects, with a fresh value initialized by the record reader. The output.collect method will serialize the value during the call (unless you are using the chainmapping from 19+), and you are free to reset the values stored in the key value objects passed to output.collect after the call.
It is a common practice to have a class field containing an object instance of the output key or value type, which are used for transformations, instead of allocating a new key or value instance in each call to map or reduce. On Tue, Jul 28, 2009 at 11:29 AM, Devajyoti Sarkar <[email protected]> wrote: > Thanks. > > Dev > > On Wed, Jul 29, 2009 at 2:27 AM, Todd Lipcon <[email protected]> wrote: > > > On Tue, Jul 28, 2009 at 11:24 AM, Devajyoti Sarkar <[email protected]> > > wrote: > > > > > Hi, > > > > > > In the hadoop documentation it says that all key-value classes need to > > > implement Writable to allow serialization and de-serialization of > outputs > > > between mappers and reducers. Is this also necessary for key/value > pairs > > > sent between the RecordReader and the Mapper (as well as the Reducer > and > > > the > > > RecordWriter)? I assume that each of these two cases, classes are > > > instantiated in the same VM. So is it safe to assume that key/value > pairs > > > are sent by reference instead of serialization/deserialization? If so, > my > > > specific application may get a performance boost. Please do let me know > > if > > > this so. > > > > > > > Yes, this is correct. The values that come out of RecordReaders and go > into > > RecordWriters do not need to implement Writable. > > > > -Todd > > > -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
