On Fri, Dec 18, 2009 at 5:20 PM, Cao Kang <[email protected]> wrote:
> Is there any example how a sequence file can be read and split in hadoop?
> Many thanks!
That should be fairly easy. The following code reads all entries in a
sequence file:
SequenceFile.Reader reader = new
SequenceFile.Reader(path.getFileSystem(config), path, config);
Writable key = (Writable)reader.getKeyClass().newInstance();
Writable value = (Writable)reader.getValueClass().newInstance();
while(reader.next(key, value)) {
System.out.println(key + "\t" + value);
}
reader.close();
Add some logic to partition the entries and write them out using a
SequenceFile.Writer.
--
Cheers,
-Ives