On Mon, Jun 25, 2018 at 10:01 AM Anh Vo <[email protected]> wrote: > Anyone able to help us troubleshoot this issue? This is getting worse. We > are back to our 3-replica setup but the issue is still happening. What we > have found is that if I just bring one set of bricks offline. For example > if I have (0 1 2) (3 4 5) (6 7 8) (9 10 11) and if I take the bricks 0 3 6 > 9, or bricks 1 4 7 10 offline then performance is super fast. The moment > all bricks are online things become very slow. It seems like gluster is > having some sort of lock contention between its members. During the period > of slowness gluster profile would show excessive time spent in LOOKUP, > FINODELK >
Have you checked if a self-heal is in progress to resync data after the bricks are all online? Healing can impact performance of user applications owing to contention and once the system reaches a steady state, the performance should improve. > > 11.60 752.64 us 10.00 us 2647757.00 us 272476323 > LOOKUP > 15.83 6884.12 us 29.00 us 2190470.00 us 40626259 > WRITE > 27.84 80480.22 us 40.00 us 11731910.00 us 6114072 > FXATTROP > 37.83 105125.18 us 12.00 us 276088722.00 us 6359515 > FINODELK > > We have about one or two months before we need to make a decision to keep > Gluster and so far it has been a lot of headache. > Detailed bug reports, RFEs in github and/or patches that can help Gluster work better for your use case are welcome! Thanks, Vijay > On Thu, Jun 14, 2018 at 10:18 AM, Anh Vo <[email protected]> wrote: > >> Our gluster keeps getting to a state where it becomes painfully slow and >> many of our applications time out on read/write call. When this happens a >> simple ls at top level directory from the mount takes somewhere between >> 8-25s (normally it is very fast, at most 1-2s). The top level directory >> only has about 10 folders. >> >> The two methods to mitigate this problem have been 1) restart all GFS >> servers or 2) stop/start the volume. 2) does take somewhere between half an >> hour to an hour for gluster to get back to its desired performance. >> >> So far the logs don't show anything unusual but perhaps I don't know what >> I should be looking for in the logs. Even when gluster are fully functional >> we see lots of logs, hard to tell which error is harmless and what is not. >> >> This issue does not seem to happen with our 3 replica glusters, only with >> 2-replica-1-arbiter and 2-replica. However, our 3-replica glusters are only >> 30% full while the 2-replica ones are about 80% full. >> We're running 3.12.9 for the servers. The clients are 3.8.15, but we >> notice the slowness of operations on 3.12.9 clients as well. >> >> Configuration: 12 GFS servers, one brick per server, replica 2, 80T each >> brick. We used to have arbiters but thought the arbiters were causing the >> slow down so we took them out. Apparently it's not the case. >> > > _______________________________________________ > Gluster-users mailing list > [email protected] > http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
