I don’t really have time to look at the current fts implementation, but… it has several options that effect performance (in particular, the FTS_NOCHDIR, FTS_NOSTAT, FTS_NOSTAT_TYPE, and FTS_XDEV options). If you are trying to compare fts to CFURLEnumerator (for example), use FTS_NOCHDIR and FTS_XDEV, but don’t use FTS_NOSTAT and FTS_NOSTAT_TYPE.
> On Apr 22, 2019, at 9:59 AM, Thomas Tempelmann <tempelm...@gmail.com> wrote: > > Jim, > thanks for your comments. > > If all you need is filenames and no other attributes, readdir is usually > faster than getattrlistbulk because it doesn't have to do as much work. > However, if you need additional attributes, getattrlistbulk is usually much > faster. Some of that extra work done by getattrlistbulk involves checking to > see what attributes were requested and packing the results into the result > buffer. > > What's interesting is that on HFS+, readdir is not faster in my tests, but on > a recent and fast Mac (i.e. not on my MacPro 2010), it can be twice as fast > as the others when scanning an APFS volume. I wonder why. Is the > implementation for getattrlistbulk in the APFS driver inefficient compared to > the one in HFS+? The source code for the APFS FS driver has still not be > published, or has it? > > You'll find that lstat is slightly faster than getattrlist (when getattrlist > is returning the same set of attributes) for the same reason. There's no > extra code needed in lstat to see what attributes were requested and packing > the results into the result buffer. > > It's also significantly faster than using NSURL's getResourceValue, even if > the NSURL has already been created regardless. That's probably due to all the > objc overhead. > > By the way, I haven't tested this but I would expect > enumeratorAtURL:includingPropertiesForKeys:options:errorHandler: (followed by > a "for (NSURL *fileURL in directoryEnumerator)" loop) to be slightly faster > than contentsOfDirectoryAtURL:includingPropertiesForKeys:options:error: > because the URLs aren't retained in a NSArray. Using CFURLEnumerator may also > be slightly faster than NSFileManager's directory enumeration. > > Now, that's something I had not considered, yet. Will try. > > Using POSIX/BSD APIs will be the fastest, but that means you have to deal > with the different capabilities between file systems yourself (although > getattrlistbulk helps with that a lot). > > Most interesting, though: > > Today someone pointed out fts_read. This does, so far always beat all other > methods, especially if I also need extra attributes (e.g. file size). > > Can you give some more information about the fts implementation? Is this > user-library-level oder kernel code that's doing this? I had expected that > this would only be a convenience userland function that uses readdir or > similar BSD functions, but it appears to beat them all, suggesting this is > optimized at a lower level. > > > I have updated my test project accordingly (with the fts code) in case anyone > likes to run their own tests: > > http://files.tempel.org/Various/DirScanner.zip > <http://files.tempel.org/Various/DirScanner.zip> > > Also, I am wondering if using concurrent threads will speed up scanning a dir > tree on an SSD as well, by distributing each directory read to one thread (or > dispatch queue). Will eventually try, but probably not soon. Gotta get my > program out of the door soon, first. > > Thomas >
_______________________________________________ Do not post admin requests to the list. They will be ignored. Filesystem-dev mailing list (Filesystem-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com