AFAIK you don't really need serialization if your job is a map-only one; the OutputFormat/RecWriter (if any) should take care of it.
On Thu, Aug 5, 2010 at 7:07 AM, David Rosenstrauch <[email protected]> wrote: > I'm working on a M/R job which uses DBInputFormat. So I have to create my > own DBWritable for this. I'm a little bit confused about how to implement > this though. > > In the sample code in the Javadoc for the DBWritable class, the MyWritable > implements both DBWritable and Writable - thereby forcing the author of the > MyWritable class to implement the methods to serialize/deserialize it > to/from DataInput & DataOutput. Without getting into too much detail, > having to implement this serialization would add a good bit of complexity to > my code. > > However, the DBWritable that I'm writing really doesn't need to exist beyond > the Mapper. I.e., it'll be input to the Mapper, but the Mapper won't emit > it out to the sort/reduce steps. And after doing some reading/digging > through the code, it looks to me like the InputFormat and the Mapper always > get run on the same host & JVM. If that's in fact the case, then there'd be > no need for me to make my DBWritable implement Writable also and so I could > avoid the whole serialization/deserialization issue. > > So my question is basically: have I got this correct? Do the InputFormat > and the Mapper always run in the same VM? (In which case I can do what I'm > planning and code the DBWritable without the serialization headaches from > the Writable class.) > > TIA, > > DR > -- Harsh J www.harshj.com
