Hi
I am trying to get the cloudsuite components set up so I can give them a
detailed performance analysis cycle decomposition (see https:/
code.google.com/p/gooda and the documents in gooda-analyzer/docs). I have
run into a few issues.
1)if you start with trying to install the web search benchmark
  hadoop is not listed as a prerequisite (nor mahout, maven, ant)
    the reality seems to be that the components must be installed in order
starting with "data analytics"

2) the hadoop package mentioned (0.20.2) is >3 years old and there are a
very large number of more recent releases. The current supported release
seems to be 1.1.2
to illustrate the point there is a link in the instructions to a single
node installation
"For more information about Hadoop installation, this link
<http://hadoop.apache.org/common/docs/r0.20.2/quickstart.html>provides
a step-by-step guide to install Hadoop on a single node."
but the link is stale as the docs directory for the old release no longer
exists

3) Hadoop appears to need a passwordless login for a group.account
hadoop.hadoop. This violates almost anybodies security regulations. In fact
the instructions cannot even be executed if you have a decent IT group :-)

Installing Hadoop on a single node:

   1. It is recommended to create a Hadoop user (Note: This requires root
   privileges and the commands can be different in various Linux distributions
   such as useradd vs adduser):
      - sudo groupadd hadoop
      - sudo useradd -g hadoop hadoop
      - sudo passwd hadoop (to setup the password)
   2. Preparing SSH:
      - su - hadoop
      - ssh-keygen -t rsa -P "" (press enter for any prompts)
      - cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
      - ssh localhost (answer yes to the prompt)

Are there plans to upgrade the instructions to be a bit more complete about
environment requirements and installations that actually have a chance of
not violating security policies?
:-)
thanks
Dave Levinthal
Google Corp

Reply via email to