Not sure if anything else exists, but you can easily implement your own
RecordReader that gets a FSDataInputStream from the FileSystem for the
FileSplit, and then read records from that like you would any other
InputStream (with offset, length, byte[], etc).


On Thu, Apr 29, 2010 at 5:36 AM, Pete Hunt <[email protected]> wrote:

> Hello all -
>
> I am currently trying to integrate the numpy Python fast-arrays package
> with
> Hadoop. I am basically looking for a way to read binary data similar to a
> SequenceFile, except without keys. That is, similar to how the
> TextInputFormat emits the position in the file as the key, I would like a
> SequenceFileInputFormat that simply emits the position or index in the file
> and the value.
>
> Does such a facility exist? If not, how would I go about implementing this?
>
> Thanks,
>
> Pete
>

Reply via email to