If GlusterFS does not support POSIX seekdir,what problems will user or 
GlusterFS have?


发自网易邮箱大师
On 11/03/2016 12:52, Raghavendra G wrote:




On Wed, Nov 2, 2016 at 9:38 AM, Raghavendra Gowdappa <rgowd...@redhat.com> 
wrote:



----- Original Message -----
> From: "Keiviw" <kei...@163.com>
> To: gluster-devel@gluster.org
> Sent: Tuesday, November 1, 2016 12:41:02 PM
> Subject: [Gluster-devel] A question of GlusterFS dentries!
>
> Hi,
> In GlusterFS distributed volumes, listing a non-empty directory was slow.
> Then I read the dht codes and found the reasons. But I was confused that
> GlusterFS dht travesed all the bricks(in the volume) sequentially,why not
> use multi-thread to read dentries from multiple bricks simultaneously.
> That's a question that's always puzzled me, Couly you please tell me
> something about this???


readdir across subvols is sequential mostly because we have to support 
rewinddir(3).


Sorry. seekdir(3) is the more relevant function here. Since rewinddir resets 
the dir stream to beginning, its not much of a difficulty to support rewinddir 
with parallel readdirs across subvols.
 

We need to maintain the mapping of offset and dentry across multiple 
invocations of readdir. In other words if someone did a rewinddir to an offset 
corresponding to earlier dentry, subsequent readdirs should return same set of 
dentries what the earlier invocation of readdir returned. For example, in an 
hypothetical scenario, readdir returned following dentries:

1. a, off=10
2. b, off=2
3. c, off=5
4. d, off=15
5. e, off=17
6. f, off=13

Now if we did rewinddir to off 5 and issue readdir again we should get 
following dentries:
(c, off=5), (d, off=15), (e, off=17), (f, off=13)

Within a subvol backend filesystem provides rewinddir guarantee for the 
dentries present on that subvol. However, across subvols it is the 
responsibility of DHT to provide the above guarantee. Which means we should've 
some well defined order in which we send readdir calls (Note that order is not 
well defined if we do a parallel readdir across all subvols). So, DHT has 
sequential readdir which is a well defined order of reading dentries.

To give an example if we have another subvol - subvol2 - (in addiction to the 
subvol above - say subvol1) with following listing:
1. g, off=16
2. h, off=20
3. i, off=3
4. j, off=19

With parallel readdir we can have many ordering like - (a, b, g, h, i, c, d, e, 
f, j), (g, h, a, b, c, i, j, d, e, f) etc. Now if we do (with readdir done 
parallely):

1. A complete listing of the directory (which can be any one of 10P1 = 10 ways 
- I hope math is correct here).
2. Do rewinddir (20)

We cannot predict what are the set of dentries that come _after_ offset 20. 
However, if we do a readdir sequentially across subvols there is only one 
directory listing i.e, (a, b, c, d, e, f, g, h, i, j). So, its easier to 
support rewinddir.

If there is no POSIX requirement for rewinddir support, I think a parallel 
readdir can easily be implemented (which improves performance too). But 
unfortunately rewinddir is still a POSIX requirement. This also opens up 
another possibility of a "no-rewinddir-support" option in DHT, which if enabled 
results in parallel readdirs across subvols. What I am not sure is how many 
users still use rewinddir? If there is a critical mass which wants performance 
with a tradeoff of no rewinddir support this can be a good feature.

+gluster-users to get an opinion on this.

regards,
Raghavendra

>
>
>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel




--

Raghavendra G
_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Reply via email to