See it this way: readFields(...) provides a DataInput stream that reads bytes from a binary stream, and write(...) provides a DataOutput stream that writes bytes to a binary stream.
Now your data-structure may be a complex one, perhaps an array of items or a mapping of some, or just a set of different types of objects. All you need to do is to think about how would you _serialize_ your data structure into a binary stream, so that you may _de-serialize_ it back from the same stream when required. About what goes where, I think looking up the definition of 'serialization' will help. It is all in the ordering. If you wrote A before B, you read A before B - simple as that. This, or you could use a neat serialization library like Apache Avro (http://avro.apache.org) and solve it in a simpler way with a schema. I'd recommend learning/using Avro for all serialization/de-serialization needs. Especially for Hadoop use-cases. On Wed, Feb 2, 2011 at 10:51 PM, Adeel Qureshi <adeelmahm...@gmail.com> wrote: > I have been trying to understand how to write a simple custom writable class > and I find the documentation available very vague and unclear about certain > things. okay so here is the sample writable implementation in javadoc of > Writable interface > > public class MyWritable implements Writable { > // Some data > private int counter; > private long timestamp; > > *public void write(DataOutput out) throws IOException { > out.writeInt(counter); > out.writeLong(timestamp); > }* > > * public void readFields(DataInput in) throws IOException { > counter = in.readInt(); > timestamp = in.readLong(); > }* > > public static MyWritable read(DataInput in) throws IOException { > MyWritable w = new MyWritable(); > w.readFields(in); > return w; > } > } > > so in readFields function we are simply saying read an int from the > datainput and put that in counter .. and then read a long and put that in > timestamp variable .. what doesnt makes sense to me is what is the format of > DataInput here .. what if there are multiple ints and multiple longs .. how > is the correct int gonna go in counter .. what if the data I am reading in > my mapper is a string line .. and I am using regular expression to parse the > tokens .. how do I specify which field goes where .. simply saying readInt > or readText .. how does that gets connected to the right stuff .. > > so in my case like I said I am reading from iis log files where my mapper > input is a log line which contains usual log information like data, time, > user, server, url, qry, responseTme etc .. I want to parse these into an > object that can be passed to reducer instead of dumping all that information > as text .. > > I would appreciate any help. > Thanks > Adeel > -- Harsh J www.harshj.com