It's not used anywhere else right now, but yes I can also imagine it
being useful. The hard part is that it's hard to write correctly
(unless I miss something) on account of a few related wrinkles:

1. SequenceFile.Reader doesn't have a "hasNext()"-like method. The
only way to know if it has more is to read the next value and see if
it's there. That's OK -- you could just read ahead one entry every
time, storing the last entry as the real current 'next' value.

2. But, SequenceFile.Reader reads into a Writable container object of
some kind. There is no generic "get()" method on Writable that pops
out the underlying value to be saved off.

3. So you have to save off the whole Writable as the last value read,
and make a new Writable via reflection for every value read. This is
possible but sort of defeats the purpose of how this is supposed to
work in Hadoop.

4. (And as a corollary, you can't define the class to use generics in
a way that lets you operate in terms of the underlying key-value
classes, only Writables.)

Reply via email to