On Wed, Nov 2, 2016 at 9:54 AM, Serkan Çoban <cobanser...@gmail.com> wrote:
> +1 for "no-rewinddir-support" option in DHT. > We are seeing very slow directory listing specially with 1500+ brick > volume, 'ls' takes 20+ second with 1000+ files. > If its not clear, I would like to point out that serialized readdir is not the sole issue that's causing slowness. If directories are _HUGE_ then I don't expect too much of benefit from parallelizing. Also, as others have been pointing out (in various in-person discussions) there are other scalability limits like number of messages, memory consumed etc to wind calls parallely. I'll probably do a rough POC in next couple of months to see whether this idea has any substance or not and post the results. > On Wed, Nov 2, 2016 at 7:08 AM, Raghavendra Gowdappa > <rgowd...@redhat.com> wrote: > > > > > > ----- Original Message ----- > >> From: "Keiviw" <kei...@163.com> > >> To: gluster-devel@gluster.org > >> Sent: Tuesday, November 1, 2016 12:41:02 PM > >> Subject: [Gluster-devel] A question of GlusterFS dentries! > >> > >> Hi, > >> In GlusterFS distributed volumes, listing a non-empty directory was > slow. > >> Then I read the dht codes and found the reasons. But I was confused that > >> GlusterFS dht travesed all the bricks(in the volume) sequentially,why > not > >> use multi-thread to read dentries from multiple bricks simultaneously. > >> That's a question that's always puzzled me, Couly you please tell me > >> something about this??? > > > > readdir across subvols is sequential mostly because we have to support > rewinddir(3). We need to maintain the mapping of offset and dentry across > multiple invocations of readdir. In other words if someone did a rewinddir > to an offset corresponding to earlier dentry, subsequent readdirs should > return same set of dentries what the earlier invocation of readdir > returned. For example, in an hypothetical scenario, readdir returned > following dentries: > > > > 1. a, off=10 > > 2. b, off=2 > > 3. c, off=5 > > 4. d, off=15 > > 5. e, off=17 > > 6. f, off=13 > > > > Now if we did rewinddir to off 5 and issue readdir again we should get > following dentries: > > (c, off=5), (d, off=15), (e, off=17), (f, off=13) > > > > Within a subvol backend filesystem provides rewinddir guarantee for the > dentries present on that subvol. However, across subvols it is the > responsibility of DHT to provide the above guarantee. Which means we > should've some well defined order in which we send readdir calls (Note that > order is not well defined if we do a parallel readdir across all subvols). > So, DHT has sequential readdir which is a well defined order of reading > dentries. > > > > To give an example if we have another subvol - subvol2 - (in addiction > to the subvol above - say subvol1) with following listing: > > 1. g, off=16 > > 2. h, off=20 > > 3. i, off=3 > > 4. j, off=19 > > > > With parallel readdir we can have many ordering like - (a, b, g, h, i, > c, d, e, f, j), (g, h, a, b, c, i, j, d, e, f) etc. Now if we do (with > readdir done parallely): > > > > 1. A complete listing of the directory (which can be any one of 10P1 = > 10 ways - I hope math is correct here). > > 2. Do rewinddir (20) > > > > We cannot predict what are the set of dentries that come _after_ offset > 20. However, if we do a readdir sequentially across subvols there is only > one directory listing i.e, (a, b, c, d, e, f, g, h, i, j). So, its easier > to support rewinddir. > > > > If there is no POSIX requirement for rewinddir support, I think a > parallel readdir can easily be implemented (which improves performance > too). But unfortunately rewinddir is still a POSIX requirement. This also > opens up another possibility of a "no-rewinddir-support" option in DHT, > which if enabled results in parallel readdirs across subvols. What I am not > sure is how many users still use rewinddir? If there is a critical mass > which wants performance with a tradeoff of no rewinddir support this can be > a good feature. > > > > +gluster-users to get an opinion on this. > > > > regards, > > Raghavendra > > > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> Gluster-devel mailing list > >> Gluster-devel@gluster.org > >> http://www.gluster.org/mailman/listinfo/gluster-devel > > _______________________________________________ > > Gluster-users mailing list > > gluster-us...@gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > -- Raghavendra G
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel