Re: [Gluster-devel] gluster volume stop and the regressions

Milind Changire Tue, 13 Feb 2018 23:01:40 -0800

The volume stop, in brick-mux mode reveals a race with my patch [1]
Although this behavior is 100% reproducible with my patch, this, by no
means, implies that my patch is buggy.


In brick-mux mode, during volume stop, when glusterd sends a brick-detach
message to the brick process for the last brick, the brick process responds
back to glusterd with an acknowledgment and then kills itself with a
SIGTERM signal. All this sounds fine. However, somehow, the response
doesn't reach glusterd and instead a socket disconnect notification reaches
glusterd before the response. This causes glusterd to presume that
something has gone wrong during volume stop and glusterd then fails the
volume stop operation causing the test to fail.

This race is reproducible by running the test
tests/basic/distribute/rebal-all-nodes-migrate.t in brick-mux mode for my
patch [1]

[1] https://review.gluster.org/19308


On Thu, Feb 1, 2018 at 9:54 AM, Atin Mukherjee <[email protected]> wrote:

> I don't think that's the right way. Ideally the test shouldn't be
> attempting to stop a volume if rebalance session is in progress. If we do
> see such a situation even with we check for rebalance status and wait till
> it finishes for 30 secs and still volume stop fails with rebalance session
> in progress error, that means either (a) rebalance session took more than
> the timeout which has been passed to EXPECT_WITHIN or (b) there's a bug in
> the code.
>
> On Thu, Feb 1, 2018 at 9:46 AM, Milind Changire <[email protected]>
> wrote:
>
>> If a *volume stop* fails at a user's production site with a reason like
>> *rebalance session is active* then the admin will wait for the session to
>> complete and then reissue a *volume stop*;
>>
>> So, in essence, the failed volume stop is not fatal; for the regression
>> tests, I would like to propose to change a single volume stop to
>> *EXPECT_WITHIN 30* so that a if a volume cannot be stopped even after 30
>> seconds, then it could be termed fatal in the regressions scenario
>>
>> Any comments about the proposal ?
>>
>> --
>> Milind
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> [email protected]
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>


-- 
Milind

_______________________________________________
Gluster-devel mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] gluster volume stop and the regressions

Reply via email to