Hadoop isn't going to magically parse your Text line into anything. You'd have to tokenize it yourself and use the tokens to create your custom writable within your map call (A constructor, or a set of setter methods). The "Writable" is for serialization and de-serialization of itself only.
On Wed, Feb 2, 2011 at 11:20 PM, Adeel Qureshi <adeelmahm...@gmail.com> wrote: > thanks for your reply .. so lets say my input files are formatted like this > > each line looks like this > DATE TIME SERVER USER URL QUERY PORT ... > > so to read this I would create a writable mapper > > public class MyMapper implements Writable { > Date date > long time > String server > String user > String url > String query > int port > > readFields(){ > date = readDate(in); //not concerned with the actual date reading function > time = readLong(in); > server = readText(in); > ..... > } > } > > but I still dont understand how is hadoop gonna know to parse my line into > these tokens .. instead of map be using the whole line as one token > > > On Wed, Feb 2, 2011 at 11:42 AM, Harsh J <qwertyman...@gmail.com> wrote: > >> See it this way: >> >> readFields(...) provides a DataInput stream that reads bytes from a >> binary stream, and write(...) provides a DataOutput stream that writes >> bytes to a binary stream. >> >> Now your data-structure may be a complex one, perhaps an array of >> items or a mapping of some, or just a set of different types of >> objects. All you need to do is to think about how would you >> _serialize_ your data structure into a binary stream, so that you may >> _de-serialize_ it back from the same stream when required. >> >> About what goes where, I think looking up the definition of >> 'serialization' will help. It is all in the ordering. If you wrote A >> before B, you read A before B - simple as that. >> >> This, or you could use a neat serialization library like Apache Avro >> (http://avro.apache.org) and solve it in a simpler way with a schema. >> I'd recommend learning/using Avro for all >> serialization/de-serialization needs. Especially for Hadoop use-cases. >> >> On Wed, Feb 2, 2011 at 10:51 PM, Adeel Qureshi <adeelmahm...@gmail.com> >> wrote: >> > I have been trying to understand how to write a simple custom writable >> class >> > and I find the documentation available very vague and unclear about >> certain >> > things. okay so here is the sample writable implementation in javadoc of >> > Writable interface >> > >> > public class MyWritable implements Writable { >> > // Some data >> > private int counter; >> > private long timestamp; >> > >> > *public void write(DataOutput out) throws IOException { >> > out.writeInt(counter); >> > out.writeLong(timestamp); >> > }* >> > >> > * public void readFields(DataInput in) throws IOException { >> > counter = in.readInt(); >> > timestamp = in.readLong(); >> > }* >> > >> > public static MyWritable read(DataInput in) throws IOException { >> > MyWritable w = new MyWritable(); >> > w.readFields(in); >> > return w; >> > } >> > } >> > >> > so in readFields function we are simply saying read an int from the >> > datainput and put that in counter .. and then read a long and put that in >> > timestamp variable .. what doesnt makes sense to me is what is the format >> of >> > DataInput here .. what if there are multiple ints and multiple longs .. >> how >> > is the correct int gonna go in counter .. what if the data I am reading >> in >> > my mapper is a string line .. and I am using regular expression to parse >> the >> > tokens .. how do I specify which field goes where .. simply saying >> readInt >> > or readText .. how does that gets connected to the right stuff .. >> > >> > so in my case like I said I am reading from iis log files where my mapper >> > input is a log line which contains usual log information like data, time, >> > user, server, url, qry, responseTme etc .. I want to parse these into an >> > object that can be passed to reducer instead of dumping all that >> information >> > as text .. >> > >> > I would appreciate any help. >> > Thanks >> > Adeel >> > >> >> >> >> -- >> Harsh J >> www.harshj.com >> > -- Harsh J www.harshj.com