TL;DR: Need to come up with a fix for AFR data self-heal from clients (mounts).

/data-self-heal.t/ creates a 1x2 volume, sets afr changelog xattrs directly on the files in the backend bricks, then runs full heal to heal the files.

The test fails intermittently when run in a loop because data self-heal attempts non-blocking locks before healing and the two heal threads (one per brick) might try to acquire the lock at the same time and both might fail. In afr-v1, only one thread gets spawned if both bricks are in the same node. In afr-v2, we cannot do this because unlike in v1, there is no conservative merge in afr_opendir_cbk() in v2. We are not sure that adding conservative merge in v2 is a good idea because it involves (multiple ) readdirs on both bricks and computing checksum on the entries to see if there is a mismatch, which can be a costly operation when done from clients. Making the locks blocking could cause one heal thread to block instead of trying to heal other files if the other thread holds the lock. One approach is to do what ec does by using a virtual xattr and handling it in the getxattr FOP to trigger data heals from clients. More thought needs to be given to this.

Regards,
Ravi


_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Reply via email to