Erik Forsberg wrote:
Hi!
I'm currently evaluating different Hadoop versions for a new project.
I'm tempted by the Cloudera distribution, since it's neatly packaged
into .deb files, and is the stable distribution but with some patches
applied, for example the bzip2 support.
I understand that I can get a support agreement from Cloudera to match
this distribution, but if that's not an option, will running the
Cloudera distribution put me in a position where I won't get any help
from the community because I'm not running an official Apache Hadoop
release?
-there is no official Apache deb so you will end up using someone elses
deb if that is how you build your cluster up.
-everyone welcomes bug reports, especially ones with stack traces.
-regardless of whether you use an official vs external release, a common
answer to any bugrep will be "does it go away on the latest release?"
And then "does it go away on trunk?".
-only you are going to be able to track down problems on your cluster,
because your machines and network is different from everybody else's.
-the act of checking out and building a release locally sets you up to
adding diagnostics and fixes to the source, fixes you can turn into
patches to get pushed in.
what cloudera are selling, then, is not the packaging, so much as them
taking over the work of fixing bugs for you. You are still free to track
down and fix your own problems, on their releases and the Apache ones
-because nobody else's network/cluster matches yours.
I am in favour of adding lots more diagnostics to hadoop, most of the
patches of mine that have gone in help with this debugging of which
machines are playing up -and why. Anything we can do to help debug
hadoop, or validate an installation, is a welcome improvement.
-steve