Ding, Hui wrote:
Thanks for this suggestion on the shell, I will take a look into that.
But I still don't understand why streaming won't work very well, it is
able
To do m/r jobs using the supplied exec right? So all the map/reduce
programs take input/output from their own local filesystem or from the
hdfs?
Streaming only works with the Text (IIRC -- check for yourself to be
sure). HBase keys and cell content are byte arrays. Aggregations of
cells use types like RowResult. Hooking up hbase with streaming would
require adaptation.
Sweeter would be the work that J-D hints at, where the invocation of the
MR child task starts up a jython/jruby interpreter and MR passes the
task script -- map or reduce -- for the child interpreter to run. Such
a system runs 'python' -- or 'ruby' -- scripts 'natively' where native
in this case is relative to the JVM that is hosting the child task.
St.Ack