Hi
I am trying to get the cloudsuite components set up so I can give them a
detailed performance analysis cycle decomposition (see https:/
code.google.com/p/gooda and the documents in gooda-analyzer/docs). I have
run into a few issues.
1)if you start with trying to install the web search benchmark
hadoop is not listed as a prerequisite (nor mahout, maven, ant)
the reality seems to be that the components must be installed in order
starting with "data analytics"
2) the hadoop package mentioned (0.20.2) is >3 years old and there are a
very large number of more recent releases. The current supported release
seems to be 1.1.2
to illustrate the point there is a link in the instructions to a single
node installation
"For more information about Hadoop installation, this link
<http://hadoop.apache.org/common/docs/r0.20.2/quickstart.html>provides
a step-by-step guide to install Hadoop on a single node."
but the link is stale as the docs directory for the old release no longer
exists
3) Hadoop appears to need a passwordless login for a group.account
hadoop.hadoop. This violates almost anybodies security regulations. In fact
the instructions cannot even be executed if you have a decent IT group :-)
Installing Hadoop on a single node:
1. It is recommended to create a Hadoop user (Note: This requires root
privileges and the commands can be different in various Linux distributions
such as useradd vs adduser):
- sudo groupadd hadoop
- sudo useradd -g hadoop hadoop
- sudo passwd hadoop (to setup the password)
2. Preparing SSH:
- su - hadoop
- ssh-keygen -t rsa -P "" (press enter for any prompts)
- cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
- ssh localhost (answer yes to the prompt)
Are there plans to upgrade the instructions to be a bit more complete about
environment requirements and installations that actually have a chance of
not violating security policies?
:-)
thanks
Dave Levinthal
Google Corp