It's easiest to consider write as a function that converts your
record to
bytes and readFields as a function restoring your record from
bytes. So it
should be the case that:
MyWritable i = new MyWritable();
i.initWithData(some_data);
i.write(byte_stream);
...
MyWritable j = new MyWritable();
j.initWithData(some_other_data); // (1)
j.readFields(byte_stream);
assert i.equals(j);
Note that the assert should be true whether or not (1) is present,
i.e. a
call to readFields should be deterministic and without hysteresis
(it should
make no difference whether the Writable is newly created or if it
formally
held some other state). readFields must also consume the entire
record, so
for example, if write outputs three integers, readFields must
consume three
integers. Variable-sized Writables are common, but any optional/
variably
sized fields must be encoded to satisfy the preceding.
So if your MyBigWritable record held two ints (integerA, integerB)
and a
MyWritable (my_writable), its write method might look like:
out.writeInt(integerA);
out.writeInt(integerB);
my_writable.write(out);
and readFields would restore:
integerA = in.readInt(in);
integerB = in.readInt(in);
my_writable.readFields(in);
There are many examples in the source of simple, compound, and
variably-sized Writables.
Your RecordReader is responsible for providing a key and value to
your map.
Most generic formats rely on Writables or another mode of
serialization to
write and restore objects to/from structured byte sequences, but less
generic InputFormats will create Writables from byte streams.
TextInputFormat, for example, will create Text objects from CR-
delimited
files, though Text objects are not, themselves, encoded in the
file. In
constrast, a SequenceFile storing the same data will encode the
Text object
(using its write method) and will restore that object as encoded.
The critical difference is that the framework needs to convert your
record
to a byte stream at various points- hence the Writable interface-
while you
may be more particular about the format from which you consume and
the
format to which you need your output to conform. Note that you can
elect to
use a different serialization framework if you prefer.
If your data structure will be used as a key (implementing
WritableComparable), it's strongly recommended that you implement a
RawComparator, which can compare the serialized bytes directly
without
deserializing both arguments. -C
On Jul 14, 2008, at 3:39 PM, Kylie McCormick wrote:
Hi There!
I'm currently working on code for my own Writable object (called
ServiceWritable) and I've been working off LongWritable for this
one. I
was
wondering, however, about the following two functions:
public void readFields(java.io.DataInput in)
and
public void write(java.io.DataOutput out)
I have my own RecordReader object to read in the complex type
Service, and
I
also have my own Writer object to write my complex type ResultSet
for
output. In LongWritable, the code is very simple:
value = in.readLong()
and
out.writeLong(value);
Since I am dealing with more complex objects, the ObjectWritable
won't
help
me. I'm a little confused with the interaction here between my
RecordReader,
and Writer objects--because there does not seem to be any
directly. Can
someone help me out here?
Thanks,
Kylie