Hassen,

I have lots of binary data that I parse using Python streaming.

The way I do this is stream the binary data into sequence files (the binary
data object I save in the key and (null) as the value).

Each key then gets written back to me line by line, key by key for an entire
block when streaming.

To have this work in streaming on the command line you need to use
*-inputformat
SequenceFileAsTextInputFormat*

To create the sequence files I have a jar file that goes from BufferedReader
and writes to org.apache.hadoop.io.SequenceFile.Writer

I am not sure if you can do this for your data but if not then make your own
InputFormat.

good luck!

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop
*/

On Mon, Jun 20, 2011 at 4:13 PM, Hassen Riahi <hassen.ri...@cern.ch> wrote:

> Dear all,
>
> Is it possible to have a binary input to a map code written in python?
>
> Thank you
> Hassen
>

Reply via email to