Please get hadoop source code and read the comment at the beginning of SequenceFile.java: * <p>Essentially there are 3 different formats for <code>SequenceFile</code>s ...
On Tue, Sep 7, 2010 at 8:13 PM, Matthew John <tmatthewjohn1...@gmail.com>wrote: > Hey , > M pretty new to Hadoop . > > I need to Sort a Metafile (TBs) and thought of using Hadoop Sort (in > examples) for it. > My input metafile looks like this --> binary stream (only 1's and 0's). It > basically contains records of 40 bytes. > Every record goes like this : > > long a; <key> --> 8 bytes. The rest of the structure will be the <value> > --> > 32 bytes > long b; > int c; > int d; > int e; > int unprocessed; > int compress_attempted; > int gatherer; > > > I have created a *FpMetaId.java (extends BytesWritable)* corresponding to > the <value> and *FpMetadata.java (extends BytesWritable)* corresponding to > the <key>. > > My sole aim is to get these records (40 bytes) sorted with the fp (double) > as the key. And I need to write these sorted records back into a metafile > (exactly my old metafile but with sorted records----> binaries only). > I also implemented :: > > *MetafileInputFormat.java ( extends SequenceFileAsBinaryInputFormat) * ---> > file making an input file format compatible to my record. > *MetafileOutputFormat<K, V> extends SequenceFileOutputFormat* ---> file > making the output file format compatible to my record. > *MetafileRecordReader.java (extends > SequenceFileAsBinaryInputFormat.SequenceFileAsBinaryRecordReader )* ---> > file implementing the record reader compatible to my record. > > MetafileRecordWriter class has been implemented with in my > MetafileOutputFormat.java file. > > Let me kindly get you through the sequence of events which followed : > > 1) I resolved all the errors in the writable classes (FpMetaId, FpMetadata) > and in/out formats (MetafileInputFormat, MetafileOutputFormat,) and > RecordReaders I implemented. > > 2) Writables I copied to /io folder. Other new files were copied to /mapred > folder. I successfully built it. > > 3) I modified the Sort file (the function I want to run with FpMetaId as > key > and FpMetadata as value and imported these new classes in the file.) I > changed default conf settings to these required Writables and > RecordReaders.. I built hadoop using ant command after this. It > successfully > got built. > > *Q). Does this ensure all the new changes have got reflected on the jar. ( > am I ready to go execute the sort function ?? )* > > 4) As I had already mentioned before, I am working with sequential file > format (binary) with a datastructure (key,value) repeating. So I wrote a C > code which generates random values for my datastructure and populated a > file > , sequentially writing (binary) my (key,value)datastructure. I gave this as > my input for the sort which should sort my (key,values) with respect to > keys. I got the error : fp_input not a SequenceFile (fp_input is my input > file). I thought Seqfiles will just be stream of binaries.. Does it contain > any specific format ? > > *Command used : bin/hadoop jar hadoop-0.20.2-examples.jar sort fp_input > fp_output* > > *Q) What does this imply ? I have no clue how to proceed further. Again, is > it because my jar file used to execute doesnt have the latest libraries ? I > could not get any good tutorials on this. > * > > It would be great if someone can offer an helping hand to this noob. > > Thanks, > Matthew John >