On Tue, Sep 29, 2009 at 1:03 AM, Wendy Smoak <[email protected]> wrote:
> I've been working with Distributed Builds lately, and I've found that > it works if everything is perfect, but if something goes wrong it has > a hard time coping with the problem, and it doesn't recover. > > For example, it's a given that at some point, an agent is going to die > without being properly removed first. > > Currently if this happens, the Queues page breaks (error/stack trace) > and you can't edit or delete the offending agent to disable or get rid > of it. > > The agent is also still shown as 'enabled' on the Distributed Agents > page even though it's not responding. > > What should happen in this case? > > I'm all for having the system automatically disable any agent that is > not behaving properly. At first, the admin may have to manually > re-enable it. In the future we might come up with a way for it to > auto-recover. > > Thoughts? > > -- > Wendy > +1
