See it this way:

readFields(...) provides a DataInput stream that reads bytes from a
binary stream, and write(...) provides a DataOutput stream that writes
bytes to a binary stream.

Now your data-structure may be a complex one, perhaps an array of
items or a mapping of some, or just a set of different types of
objects. All you need to do is to think about how would you
_serialize_ your data structure into a binary stream, so that you may
_de-serialize_ it back from the same stream when required.

About what goes where, I think looking up the definition of
'serialization' will help. It is all in the ordering. If you wrote A
before B, you read A before B - simple as that.

This, or you could use a neat serialization library like Apache Avro
(http://avro.apache.org) and solve it in a simpler way with a schema.
I'd recommend learning/using Avro for all
serialization/de-serialization needs. Especially for Hadoop use-cases.

On Wed, Feb 2, 2011 at 10:51 PM, Adeel Qureshi <adeelmahm...@gmail.com> wrote:
> I have been trying to understand how to write a simple custom writable class
> and I find the documentation available very vague and unclear about certain
> things. okay so here is the sample writable implementation in javadoc of
> Writable interface
>
> public class MyWritable implements Writable {
>       // Some data
>       private int counter;
>       private long timestamp;
>
>       *public void write(DataOutput out) throws IOException {
>         out.writeInt(counter);
>         out.writeLong(timestamp);
>       }*
>
>      * public void readFields(DataInput in) throws IOException {
>         counter = in.readInt();
>         timestamp = in.readLong();
>       }*
>
>       public static MyWritable read(DataInput in) throws IOException {
>         MyWritable w = new MyWritable();
>         w.readFields(in);
>         return w;
>       }
>     }
>
> so in readFields function we are simply saying read an int from the
> datainput and put that in counter .. and then read a long and put that in
> timestamp variable .. what doesnt makes sense to me is what is the format of
> DataInput here .. what if there are multiple ints and multiple longs .. how
> is the correct int gonna go in counter .. what if the data I am reading in
> my mapper is a string line .. and I am using regular expression to parse the
> tokens .. how do I specify which field goes where .. simply saying readInt
> or readText .. how does that gets connected to the right stuff ..
>
> so in my case like I said I am reading from iis log files where my mapper
> input is a log line which contains usual log information like data, time,
> user, server, url, qry, responseTme etc .. I want to parse these into an
> object that can be passed to reducer instead of dumping all that information
> as text ..
>
> I would appreciate any help.
> Thanks
> Adeel
>



-- 
Harsh J
www.harshj.com

Reply via email to