Hello Chris: Thanks for the prompt reply! So, to conclude from your note: -- Presently, my RecordReader converts XML strings from a file to MyWritable object -- When readFields is called, RecordReader should provide the next MyWritable object, if there is one -- When write is called, MyWriter should write the objects out
The RecordReader is record-oriented, but both the readFields and write functions are byte-oriented... in order for Hadoop to be happy, I need to coordinate my record-oriented to byte-oriented. Is this correct? I just want to make sure before I tinker more with the code, to have the design properly down. Thanks! Kylie On Mon, Jul 14, 2008 at 3:43 PM, Chris Douglas <[EMAIL PROTECTED]> wrote: > It's easiest to consider write as a function that converts your record to > bytes and readFields as a function restoring your record from bytes. So it > should be the case that: > > MyWritable i = new MyWritable(); > i.initWithData(some_data); > i.write(byte_stream); > ... > MyWritable j = new MyWritable(); > j.initWithData(some_other_data); // (1) > j.readFields(byte_stream); > assert i.equals(j); > > Note that the assert should be true whether or not (1) is present, i.e. a > call to readFields should be deterministic and without hysteresis (it should > make no difference whether the Writable is newly created or if it formally > held some other state). readFields must also consume the entire record, so > for example, if write outputs three integers, readFields must consume three > integers. Variable-sized Writables are common, but any optional/variably > sized fields must be encoded to satisfy the preceding. > > So if your MyBigWritable record held two ints (integerA, integerB) and a > MyWritable (my_writable), its write method might look like: > > out.writeInt(integerA); > out.writeInt(integerB); > my_writable.write(out); > > and readFields would restore: > > integerA = in.readInt(in); > integerB = in.readInt(in); > my_writable.readFields(in); > > There are many examples in the source of simple, compound, and > variably-sized Writables. > > Your RecordReader is responsible for providing a key and value to your map. > Most generic formats rely on Writables or another mode of serialization to > write and restore objects to/from structured byte sequences, but less > generic InputFormats will create Writables from byte streams. > TextInputFormat, for example, will create Text objects from CR-delimited > files, though Text objects are not, themselves, encoded in the file. In > constrast, a SequenceFile storing the same data will encode the Text object > (using its write method) and will restore that object as encoded. > > The critical difference is that the framework needs to convert your record > to a byte stream at various points- hence the Writable interface- while you > may be more particular about the format from which you consume and the > format to which you need your output to conform. Note that you can elect to > use a different serialization framework if you prefer. > > If your data structure will be used as a key (implementing > WritableComparable), it's strongly recommended that you implement a > RawComparator, which can compare the serialized bytes directly without > deserializing both arguments. -C > > > On Jul 14, 2008, at 3:39 PM, Kylie McCormick wrote: > > Hi There! >> I'm currently working on code for my own Writable object (called >> ServiceWritable) and I've been working off LongWritable for this one. I >> was >> wondering, however, about the following two functions: >> >> public void readFields(java.io.DataInput in) >> and >> public void write(java.io.DataOutput out) >> >> I have my own RecordReader object to read in the complex type Service, and >> I >> also have my own Writer object to write my complex type ResultSet for >> output. In LongWritable, the code is very simple: >> >> value = in.readLong() >> and >> out.writeLong(value); >> >> Since I am dealing with more complex objects, the ObjectWritable won't >> help >> me. I'm a little confused with the interaction here between my >> RecordReader, >> and Writer objects--because there does not seem to be any directly. Can >> someone help me out here? >> >> Thanks, >> Kylie >> > > -- The Circle of the Dragon -- unlock the mystery that is the dragon. http://www.blackdrago.com/index.html "Light, seeking light, doth the light of light beguile!" -- William Shakespeare's Love's Labor's Lost
