On Thu, Feb 14, 2013 at 11:20 AM, Ryan Lane <[email protected]> wrote:
> The glusterd process went into a death spiral at around midnight UTC last > night. The glusterfs/glusterfsd processes continued to work fine, which > allowed the filesystem to continue to work properly, but all four servers > were approaching swap death. > > This version of gluster also has issues with the upstart scripts. It won't > properly start/stop the gluster services. I'm having to reboot the hosts. > I'm going to track down this issue today in a labs instance. For the next > few hours some projects will have issues accessing project and/or home > directories. > > This will not affect services using instance storage (/mnt). > > Volumes are being force restarted right now. login should work to all nodes and project storage should work perfectly fine for most projects, currently. All volumes should be completely up in about an hour. The glusterfs folks can't reproduce the upstart issue we're seeing in our cluster. As a workaround for now, I've replaced the upstarts with init scripts, which behave exactly as expected. It should be possible to work around the outage condition we had today in the future without a prolonged volume force start state. - Ryan
_______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
