On 03/10/2017 02:45 AM, Xavier Hernandez wrote:
Hi,

I've posted a patch [1] to fix a memory leak in locks xlator. The fix
seems quite straightforward, however I've seen a deadlock in the centos
regression twice [2] [3] on the locks_revocation.t test, causing the
test to timeout and be aborted.

At first sight I haven't seen other failures of this kind for other
patches, so it seems that the spurious failure has been introduced by my
patch.

This has been a cause of a few aborted runs in the past as well, and looks like it continues to be so, see [4].

I do not think this is due to your patch, as there are a few instances of the same in the past as well.

fstat.gluster.org unfortunately does not report this test as the cause of an aborted run (I mean to file a bug on this), as otherwise I am sure this would have bubbled up higher in that report.


Anyone with deeper knowledge on locks xlator can help me identify the
cause ? I'm unable to see how the change can interfere with lock
revocation.

I've tried to reproduce it locally, but the test passed successfully all
times.

@Nigel, is it possible to get the logs generated by an aborted job from
some place ? I have looked into the place where failed jobs store their
logs, but aren't there. It seems that the slave node is restarted after
an abort, but logs are not saved.

Thanks,

Xavi

[1] https://review.gluster.org/16838/
[2] https://build.gluster.org/job/centos6-regression/3563/console
[3] https://build.gluster.org/job/centos6-regression/3579/console
[4] Older failures of lock-revocation.t: http://lists.gluster.org/pipermail/gluster-devel/2017-February/052158.html
_______________________________________________
Gluster-infra mailing list
Gluster-infra@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-infra

Reply via email to