On 01/25/2013 07:47 AM, jayunit...@gmail.com wrote:
Hi guys: I just saw an issue on the Hfds mailing list that might be a
potential problem in gluster clusters. It kind of reminds me of
Jeff's idea of bricks as first class objects in the API.
What happens if a gluster brick is on a machine which, although still
alive, performs poorly?
would such scenarios be detected and if so, can the brick be
decommissioned/ignored/moved ? If not it would be a cool feature to
have because I'm sure it happens from time to time.
There's nothing currently in place to detect such a condition, and of
course if we can't detect it we can't do anything about it. There are
also several cases where we might actually manage to make things worse
if we try to do this ourselves. For example, consider the case where
the slowness is because of a short-duration contending activity. We
might well react just as that activity subsides, suspending that brick
just as another brick is "going bad" due to similar transient activity
there. Similarly, if the system overall is truly overloaded, suspending
bricks is a bit like squeezing a water balloon - the "bulge" just
reappears elsewhere and all we've done is diminish total resources
available.
I've seen problems like this with other parallel filesystems, and I'm
pretty sure I've read papers about them too. IMO the right place to
deal with such issues is at the job-scheduler or similar level, where
more of the total system state is known. What we can do is provide more
information about our part of the system state, plus levers that they
can pull when they decide that preparation or correction for a
higher-level event (that we probably don't even know about) is appropriate.
_______________________________________________
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel