http://issues.apache.org/jira/browse/HADOOP-4998 is opened for the
purpose of substituting bash calls with library calls. It has been
there for 8 months now and looks like it could use some help from
hadoop contributors. :)
-Hong
On Sep 17, 2009, at 7:29 PM, Harish Mallipeddi wrote:
MySpace recently released their map-reduce implementation as
opensource
(it's .NET based). MySpace as you might know is one of the few big
websites
that runs on Windows.
http://code.google.com/p/qizmt/
On Thu, Sep 17, 2009 at 10:42 PM, Steve Loughran <ste...@apache.org>
wrote:
Bill Habermaas wrote:
It's interesting that Hadoop, being written entirely in Java, has
such a
spotty reputation running on different platforms. I had to patch
it to run
on AIX and need cygwin (gack!) so it will run on Windows. I'm
surprised
nobody has thought about removing it's use of bash to run system
commands
(which is NOT especially portable). Now that Hadoop only comes
only in a
Java 1.6 flavor why can't it figure out disk space using the
native java
runtime instead of executing the DF command under bash? Of course
it runs
other system commands as well which in my opinion isn't too cool.
It is run at scale on big linux systems, and they are the ones that
encounter problems with 16GB heaps and exec(), various other JVM
quirks that
lead the developers to say Linux + Sun JVM only. You are free to
use other
operating systems and even JVMs (I've used JRockit with some minor
logging
problems in test runs), but you get to encounter the problems. You
can and
should submit patches back, but if you diverge from the approved
standard,
you get to retest at scale, because nobody else is going to do it
for you.
Supporting different unix versions is much easier than supporting
windows+linux/unix, especially if you are trying to do high
availability
stuff, integrate with management tools, etc. I think it would be
nice if
Hadoop would build and run standalone on Windows without cygwin,
but for all
other actions, a more ruthless "Unix-ish only" would be harsh but
make it
easier to manage problems.
Even in a Linux-only world, you are left with the "which distro",
question
-were there to be official apache Hadoop RPMs and .deb files,
there'd be
discussions about which platforms to support. RHEL+Centos 5.X would
be the
obvious choice, but what else?
-steve
--
Harish Mallipeddi
http://blog.poundbang.in