----- Original Message -----
> From: "Keiviw" <kei...@163.com>
> To: gluster-de...@gluster.org
> Sent: Tuesday, November 1, 2016 12:41:02 PM
> Subject: [Gluster-devel] A question of GlusterFS dentries!
> 
> Hi,
> In GlusterFS distributed volumes, listing a non-empty directory was slow.
> Then I read the dht codes and found the reasons. But I was confused that
> GlusterFS dht travesed all the bricks(in the volume) sequentially,why not
> use multi-thread to read dentries from multiple bricks simultaneously.
> That's a question that's always puzzled me, Couly you please tell me
> something about this???

readdir across subvols is sequential mostly because we have to support 
rewinddir(3). We need to maintain the mapping of offset and dentry across 
multiple invocations of readdir. In other words if someone did a rewinddir to 
an offset corresponding to earlier dentry, subsequent readdirs should return 
same set of dentries what the earlier invocation of readdir returned. For 
example, in an hypothetical scenario, readdir returned following dentries:

1. a, off=10
2. b, off=2
3. c, off=5
4. d, off=15
5. e, off=17
6. f, off=13

Now if we did rewinddir to off 5 and issue readdir again we should get 
following dentries:
(c, off=5), (d, off=15), (e, off=17), (f, off=13)

Within a subvol backend filesystem provides rewinddir guarantee for the 
dentries present on that subvol. However, across subvols it is the 
responsibility of DHT to provide the above guarantee. Which means we should've 
some well defined order in which we send readdir calls (Note that order is not 
well defined if we do a parallel readdir across all subvols). So, DHT has 
sequential readdir which is a well defined order of reading dentries.

To give an example if we have another subvol - subvol2 - (in addiction to the 
subvol above - say subvol1) with following listing:
1. g, off=16
2. h, off=20
3. i, off=3
4. j, off=19

With parallel readdir we can have many ordering like - (a, b, g, h, i, c, d, e, 
f, j), (g, h, a, b, c, i, j, d, e, f) etc. Now if we do (with readdir done 
parallely):

1. A complete listing of the directory (which can be any one of 10P1 = 10 ways 
- I hope math is correct here).
2. Do rewinddir (20)

We cannot predict what are the set of dentries that come _after_ offset 20. 
However, if we do a readdir sequentially across subvols there is only one 
directory listing i.e, (a, b, c, d, e, f, g, h, i, j). So, its easier to 
support rewinddir.

If there is no POSIX requirement for rewinddir support, I think a parallel 
readdir can easily be implemented (which improves performance too). But 
unfortunately rewinddir is still a POSIX requirement. This also opens up 
another possibility of a "no-rewinddir-support" option in DHT, which if enabled 
results in parallel readdirs across subvols. What I am not sure is how many 
users still use rewinddir? If there is a critical mass which wants performance 
with a tradeoff of no rewinddir support this can be a good feature.

+gluster-users to get an opinion on this.

regards,
Raghavendra

> 
> 
> 
> 
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Reply via email to