Amen. Running shell commands within Hadoop by invoking bash is not what I would consider a good thing. I had to do a patch sometime back because the DF command produced different output on AIX which cause Hadoop to think it didn't have any disk space. I heartily second the notion of an operating system abstraction layer.
Bill -----Original Message----- From: Allen Wittenauer [mailto:[email protected]] Sent: Thursday, December 17, 2009 5:48 PM To: [email protected] Subject: Re: Why DrWho On 12/17/09 1:36 PM, "Edward Capriolo" <[email protected]> wrote: > In a nutshell, this is the same problem you face with shell scripting, > assuming external binary files exist. assuming they take a set of > arguments, assuming they produce a result code, assuming the output is > formatted in a specific way. Yup. There was a JIRA posted the other day about a shell command break on Mac OS X (the stat command). I suspect the same break happens on other BSD environments. Ironically, Solaris has GNU stat, so that particular shell out worked just fine. Every time we issue a fork(), we risk breaking an OS. I really wish we'd give more weight to building some sort of compatibility layer.
