Not sure if anything else exists, but you can easily implement your own RecordReader that gets a FSDataInputStream from the FileSystem for the FileSplit, and then read records from that like you would any other InputStream (with offset, length, byte[], etc).
On Thu, Apr 29, 2010 at 5:36 AM, Pete Hunt <[email protected]> wrote: > Hello all - > > I am currently trying to integrate the numpy Python fast-arrays package > with > Hadoop. I am basically looking for a way to read binary data similar to a > SequenceFile, except without keys. That is, similar to how the > TextInputFormat emits the position in the file as the key, I would like a > SequenceFileInputFormat that simply emits the position or index in the file > and the value. > > Does such a facility exist? If not, how would I go about implementing this? > > Thanks, > > Pete >
