On Thu, Jul 24, 2008 at 10:48 PM, Venkat Seeth <[EMAIL PROTECTED]> wrote:
> Why dont you use hadoop streaming?

I think that's more of a broader question - why doesn't everyone use
streaming?

There's no real difference between doing Hadoop in
Ruby/Scala/Java/Jython/whatever - these days, Java is just one of many
languages that run on the JVM.  There's not language-specific reason
to pick streaming over a native implementation if you're working in a
language that has a JVM implementation.  I'm working on a Ruby
interface just because I think there's a space for a nice DSL for
setting up Hadoop and running tasks that's more pleasant for people
used to writing Ruby than the current idioms.

Streaming is great for things that don't run on a JVM - Erlang,
Haskell, Smalltalk, etc.

If you're streaming, though, you loose all the flexibility of Hadoop.
You get line-oriented text in and out, and that's about it.  But if
you want all the Hadoop features, you're going to want to go native,
be it in Ruby, Scala, Java, or whatever your language of choice is.

Streaming is powerful, and huge numbers of solutions of the form
"my_code < data > output" have solved many, many problems over the
years.  If your problem fits in the streaming space, then you should
consider it.   And I think that's a language-neutral statement - just
because your solution is in Java doesn't mean you should bother
hooking it up into a native Hadoop app.

-- 
James Moore | [EMAIL PROTECTED]
Ruby and Ruby on Rails consulting
blog.restphone.com

Reply via email to