Ding, Hui wrote:
Thanks for this suggestion on the shell, I will take a look into that.
But I still don't understand why streaming won't work very well, it is
able
To do m/r jobs using the supplied exec right? So all the map/reduce
programs take input/output from their own local filesystem or from the
hdfs?
Streaming only works with the Text (IIRC -- check for yourself to be sure). HBase keys and cell content are byte arrays. Aggregations of cells use types like RowResult. Hooking up hbase with streaming would require adaptation.

Sweeter would be the work that J-D hints at, where the invocation of the MR child task starts up a jython/jruby interpreter and MR passes the task script -- map or reduce -- for the child interpreter to run. Such a system runs 'python' -- or 'ruby' -- scripts 'natively' where native in this case is relative to the JVM that is hosting the child task.

St.Ack

Reply via email to