Unless or until we put authentication in-place in the webAPI, I think we need to be very careful about adding admin function to it.

We have clusters with an increasing number of users who are not cluster admins on them. We don't want to enable them to reconfig the system. We do want to allow them to use the web UIs to diagnose the status of their jobs and files.

In our other large clustered applications, we've found that segregating a web interrogation UI from a file based (or other control port) based admin UI avoids a lot of pain and allows administrative flexibility.

On Aug 11, 2006, at 1:14 PM, Doug Cutting (JIRA) wrote:

[ http://issues.apache.org/jira/browse/HADOOP-442? page=comments#action_12427609 ]

Doug Cutting commented on HADOOP-442:
-------------------------------------

The slaves file is currently only used by the start/stop scripts, so it won't help here.

Perhaps the jobtracker and namenode should have a public API that permits particular hosts to be banned. Then the web ui could then use this to let adminstrators ban hosts. We could initialize the list from a config file, in the case of persistently bad hosts.

slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster --------------------------------------------------------------------- --------------------------------------------------

                Key: HADOOP-442
                URL: http://issues.apache.org/jira/browse/HADOOP-442
            Project: Hadoop
         Issue Type: Bug
           Reporter: Yoram Arnon

I recently had a few nodes go bad, such that they were inaccessible to ssh, but were still running their java processes.
tasks that executed on them were failing, causing jobs to fail.
I couldn't stop the java processes, because of the ssh issue, so I was helpless until I could actually power down these nodes. restarting the cluster doesn't help, even when removing the bad nodes from the slaves file - they just reconnect and are accepted. while we plan to avoid tasks from launching on the same nodes over and over, what I'd like is to be able to prevent rogue processes from connecting to the masters. Ideally, the slaves file will contain an 'exclude' section, which will list nodes that shouldn't be accessed, and should be ignored if they try to connect. That would also help in configuring the slaves file for a large cluster - I'd list the full range of machines in the cluster, then list the ones that are down in the 'exclude' section

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/ Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/ software/jira



Reply via email to