On Thu, Jul 24, 2008 at 10:48 PM, Venkat Seeth <[EMAIL PROTECTED]> wrote: > Why dont you use hadoop streaming?
I think that's more of a broader question - why doesn't everyone use streaming? There's no real difference between doing Hadoop in Ruby/Scala/Java/Jython/whatever - these days, Java is just one of many languages that run on the JVM. There's not language-specific reason to pick streaming over a native implementation if you're working in a language that has a JVM implementation. I'm working on a Ruby interface just because I think there's a space for a nice DSL for setting up Hadoop and running tasks that's more pleasant for people used to writing Ruby than the current idioms. Streaming is great for things that don't run on a JVM - Erlang, Haskell, Smalltalk, etc. If you're streaming, though, you loose all the flexibility of Hadoop. You get line-oriented text in and out, and that's about it. But if you want all the Hadoop features, you're going to want to go native, be it in Ruby, Scala, Java, or whatever your language of choice is. Streaming is powerful, and huge numbers of solutions of the form "my_code < data > output" have solved many, many problems over the years. If your problem fits in the streaming space, then you should consider it. And I think that's a language-neutral statement - just because your solution is in Java doesn't mean you should bother hooking it up into a native Hadoop app. -- James Moore | [EMAIL PROTECTED] Ruby and Ruby on Rails consulting blog.restphone.com
