Hi,

Some parts of this topic has been discussed in the recent past here [1]

The current mechanism of each xlator encoding the subvol in the lower or higher bits has its pitfalls as discussed in the threads and in this review, here [2]

Here is a solution design from the one of the comments posted on this by Avati here, [3], as in,

"One example approach (not necessarily the best): Make every xlator knows the total number of leaf xlators (protocol/clients), and also the number of all leaf xlators from each of its subvolumes. This way, the protocol/client xlators (alone) do the encoding, by knowing its global brick# and total #of bricks. The cluster xlators blindly forward the readdir_cbk without any further transformations of the d_offs, and also route the next readdir(old_doff) request to the appropriate subvolume based on the weighted graph (of counts of protocol/clients in the subtrees) till it reaches the right protocol/client to resume the enumeration."

So the current proposed scheme that is being worked on is as follows,
- encode the d_off with the client/protocol ID, which is generated as its leaf position/number
- no further encoding in any other xlator
- on receiving further readdir requests with the d_off, consult the, graph/or immediate children, on ID encoded in the d_off, and send the request down that subvol path

IOW, given a d_off and a common routine, pass the d_off with this (i.e current xlator) to get a subvol that the d_off belongs to. This routine would decode the d_off for the leaf ID as encoded in the client/protocol layer, and match its subvol relative to this and send that for further processing. (it may consult the graph or store the range of IDs that any subvol has w.r.t client/protocol and deliver the result appropriately).

Given the current situation of ext4 and xfs, and continuing with the ID encoding scheme, this seems to be the best manner of preventing multiple encoding of subvol stomping on each other, and also preserving (in a sense) further loss of bits. This scheme would also give AFR/EC the ability to load balance readdir requests across its subvols better, than have a static subvol to send to for a longer duration.

Thoughts/comments?

Shyam

[1] https://www.mail-archive.com/gluster-devel@gluster.org/msg02834.html
[2] review.gluster.org/#/c/8201/4/xlators/cluster/afr/src/afr-dir-read.c
[3] https://www.mail-archive.com/gluster-devel@gluster.org/msg02847.html
_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Reply via email to