I don't think that binary input works with streaming because of the
assumption of one record per line.

If you want to script map-reduce programs, would you be open to a Groovy
implementation that avoids these problems?


On 4/7/08 6:42 AM, "John Menzer" <[EMAIL PROTECTED]> wrote:

> 
> hi,
> 
> i would like to use binary input and output data in combination with hadoop
> streaming.
> 
> the reason why i want to use binary data is, that parsing text to float
> seems to consume a big lot of time compared to directly reading the binary
> floats.
> 
> i am using a C-coded mapper (getting streaming data from stdin and writing
> to stdout) and no reducer.
> 
> so my question is: how do i implement binary input output in this context?
> as far as i understand i need to put an '\n' char at the end of each
> binary-'line'. so hadoop knows how to split/distribute the input data among
> the nodes and how to collect it for output(??)
> 
> is this approach reasonable?
> 
> thanks,
> john

Reply via email to