On Tue, Sep 29, 2009 at 1:03 AM, Wendy Smoak <[email protected]> wrote:

> I've been working with Distributed Builds lately, and I've found that
> it works if everything is perfect, but if something goes wrong it has
> a hard time coping with the problem, and it doesn't recover.
>
> For example, it's a given that at some point, an agent is going to die
> without being properly removed first.
>
> Currently if this happens, the Queues page breaks (error/stack trace)
> and you can't edit or delete the offending agent to disable or get rid
> of it.
>
> The agent is also still shown as 'enabled' on the Distributed Agents
> page even though it's not responding.
>
> What should happen in this case?
>
> I'm all for having the system automatically disable any agent that is
> not behaving properly.  At first, the admin may have to manually
> re-enable it.  In the future we might come up with a way for it to
> auto-recover.
>
> Thoughts?
>
> --
> Wendy
>

+1

Reply via email to