On 02/02/2015 10:29 PM, Krishnan Parthasarathi wrote:
IOW, given a d_off and a common routine, pass the d_off with this (i.e
current xlator) to get a subvol that the d_off belongs to. This routine
would decode the d_off for the leaf ID as encoded in the client/protocol
layer, and match its subvol relative to this and send that for further
processing. (it may consult the graph or store the range of IDs that any
subvol has w.r.t client/protocol and deliver the result appropriately).

What happens to this scheme when bricks are repeatedly added/removed?

The result should be no different than what the current scheme in code does, i.e encode the subvol ID based on children of DHT, which is based on dht_subvol_cnt, which means indirectly the order of children seen in the graph.

I would further state, this change does not improve that limitation, rather it just changes the encoding to a single point.


IIUC, the leaf xlator encoding proposed should be performed during graph
initialization and would remain static for the lifetime of the graph.
When bricks are added or removed, it would trigger a graph change, and
the new encoding would be computed. Further, it is guaranteed that
ongoing (readdir) FOPs would complete in the same (old) graph and therefore
the encoding should be unaffected by bricks being added/removed.


I would differ in the reasoning here, NFS clients store d_off returned on directory scans, hence it is possible that they come back with those d_off values post a graph switch and in this case it would be a fresh opendir and then seeking to the d_off provided (with all the subvol ID decoding etc.).

So in short, we are not immune to this.

Shyam
_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Reply via email to