Re: Writable readFields and write functions

Chris Douglas Mon, 14 Jul 2008 18:35:15 -0700

-- Presently, my RecordReader converts XML strings from a file toMyWritable
object
-- When readFields is called, RecordReader should provide the next
MyWritable object, if there is one
-- When write is called, MyWriter should write the objects out

Not quite. Your RecordReader may produce MyWritable records, butreadFields may not be involved. For your MyWritable records to get toyour reduce, they should implement the Writable interface so theframework may regard them as streams of bytes. Your OutputFormat-which may use your MyWriter- may take the MyWritable objects you emitfrom your reduce and make them conform to whatever format your specrequires.

* Your InputFormat takes XML and provides MyWritable objects to yourmapper* The framework calls MyWritable::write(byte_stream) andMyWritable::readFields(byte_stream) to push records you emit from yourmapper across the network, between abstractions, etc.* Your OuputFormat takes MyWritable objects you emit from your reducerand stores them according to the format you specify

With many exceptions, most RecordReaders calling readFields arereading from structured, generic formats (like SequenceFile). -C

The RecordReader is record-oriented, but both the readFields and write
functions are byte-oriented... in order for Hadoop to be happy, Ineed to
coordinate my record-oriented to byte-oriented.
Is this correct? I just want to make sure before I tinker more withthe
code, to have the design properly down.

Thanks!
Kylie


On Mon, Jul 14, 2008 at 3:43 PM, Chris Douglas <[EMAIL PROTECTED]>
wrote:
It's easiest to consider write as a function that converts yourrecord tobytes and readFields as a function restoring your record frombytes. So it
should be the case that:

MyWritable i = new MyWritable();
i.initWithData(some_data);
i.write(byte_stream);
...
MyWritable j = new MyWritable();
j.initWithData(some_other_data); // (1)
j.readFields(byte_stream);
assert i.equals(j);
Note that the assert should be true whether or not (1) is present,i.e. acall to readFields should be deterministic and without hysteresis(it shouldmake no difference whether the Writable is newly created or if itformallyheld some other state). readFields must also consume the entirerecord, sofor example, if write outputs three integers, readFields mustconsume threeintegers. Variable-sized Writables are common, but any optional/variably
sized fields must be encoded to satisfy the preceding.
So if your MyBigWritable record held two ints (integerA, integerB)and a
MyWritable (my_writable), its write method might look like:

out.writeInt(integerA);
out.writeInt(integerB);
my_writable.write(out);

and readFields would restore:

integerA = in.readInt(in);
integerB = in.readInt(in);
my_writable.readFields(in);

There are many examples in the source of simple, compound, and
variably-sized Writables.
Your RecordReader is responsible for providing a key and value toyour map.Most generic formats rely on Writables or another mode ofserialization to
write and restore objects to/from structured byte sequences, but less
generic InputFormats will create Writables from byte streams.
TextInputFormat, for example, will create Text objects from CR-delimitedfiles, though Text objects are not, themselves, encoded in thefile. Inconstrast, a SequenceFile storing the same data will encode theText object
(using its write method) and will restore that object as encoded.
The critical difference is that the framework needs to convert yourrecordto a byte stream at various points- hence the Writable interface-while youmay be more particular about the format from which you consume andtheformat to which you need your output to conform. Note that you canelect to
use a different serialization framework if you prefer.

If your data structure will be used as a key (implementing
WritableComparable), it's strongly recommended that you implement a
RawComparator, which can compare the serialized bytes directlywithout
deserializing both arguments. -C


On Jul 14, 2008, at 3:39 PM, Kylie McCormick wrote:

Hi There!
I'm currently working on code for my own Writable object (called
ServiceWritable) and I've been working off LongWritable for thisone. I
was
wondering, however, about the following two functions:

public void readFields(java.io.DataInput in)
and
public void write(java.io.DataOutput out)
I have my own RecordReader object to read in the complex typeService, and
I
also have my own Writer object to write my complex type ResultSetfor
output. In LongWritable, the code is very simple:

value = in.readLong()
and
out.writeLong(value);
Since I am dealing with more complex objects, the ObjectWritablewon't
help
me. I'm a little confused with the interaction here between my
RecordReader,
and Writer objects--because there does not seem to be anydirectly. Can
someone help me out here?

Thanks,
Kylie
--
The Circle of the Dragon -- unlock the mystery that is the dragon.
http://www.blackdrago.com/index.html

"Light, seeking light, doth the light of light beguile!"
-- William Shakespeare's Love's Labor's Lost

Re: Writable readFields and write functions

Reply via email to