> On Apr 29, 2019, at 1:19 PM, Thomas Tempelmann <tempelm...@gmail.com> wrote:
> Jim,
> In contentsOfDirectoryAtURL, instead of "includingPropertiesForKeys:nil", use 
> "includingPropertiesForKeys:@[NSURLVolumeIdentifierKey]" (and add whatever 
> other property keys you know you'll need). The whole purpose of the 
> includingPropertiesForKeys argument is so the enumerator code can pre-fetch 
> the properties you need as efficiently as possible. The enumeration will be a 
> bit slower, but the entire operation of enumerating and getting the 
> properties from the URLs returned will be faster.
> I know. That's the theory, but my benchmarking says it makes no difference in 
> that case. And that's quite logical because the pre-caching is meant for data 
> that has to come from the lowest level, i.e. where the catalog data is 
> fetched - it makes sense to combine multiple property requests into one, just 
> like the getdirentriesattr is meant to used like. However, as I explained the 
> volume ID is not stored in the catalog but at a higher level, and therefore 
> pre-fetching this at the lowest level makes no difference, at requires no 
> catalog access, right?

The volume ID is at a higher layer, but the enumeration code attempts to 
retrieve the value less than once per URL returned. That said, if the directory 
hierarchy has few items per directory, the number of times it is retrieved will 
be higher. You can write a bug report and I'll look to see if there are ways to 
improve the performance.

In the meantime, there's something you could do to improve the performance 
(even if our code changes). You can get the volumeIdentifier for the directory 
you start enumerating from. It will be the same for the entire enumeration 
except when directories are seen on other file systems (today, that's volume 
mount points and mount triggers). Like this:

        NSURL *directoryURL = [NSURL 
fileURLWithPath:@"/System/Applications/Utilities/" isDirectory:YES];
        // get the volume identifier for most of the enumeration
        id mainVolumeIdentifier;
        [directoryURL getResourceValue:&mainVolumeIdentifier 
forKey:NSURLVolumeIdentifierKey error:nil];
        NSDirectoryEnumerator *directoryEnumerator = 
[NSFileManager.defaultManager enumeratorAtURL:directoryURL 
includingPropertiesForKeys:nil options:0 errorHandler:nil];
        for (NSURL *url in directoryEnumerator) {
                NSNumber *isVolume;
                NSNumber *isMountTrigger;
                if ( ([url getResourceValue:&isVolume forKey:NSURLIsVolumeKey 
error:nil] && isVolume.boolValue)
                        || ([url getResourceValue:&isMountTrigger 
forKey:NSURLIsMountTriggerKey error:nil] && isMountTrigger.boolValue) ) {
                        // get the volume identifier for the volume or mount 
                        id otherVolumeIdentifier ;
                        [directoryURL getResourceValue:&otherVolumeIdentifier 
forKey:NSURLVolumeIdentifierKey error:nil];

> My performance tests always runs twice in fast succession, so that in the 
> second run, due to caching, all data's ready and does not incur random delays 
> that would give imprecise measurements. Sure, this does not give me the worst 
> case, but it gives me the best case results at least. And these best case 
> results say: Scanning "/System" on my Mac without getting the Volume ID takes 
> less than 3s, but with (with and without pre-fetching) getting it takes over 
> 6s. That's TWICE as much time. With smaller dir tree the difference is less, 
> possibly because then there's other caches helping.
> I assume that when I re-run the scan, after having released all NSURLs from 
> the previous scan (even by restarting the test app), the framework creates, 
> fresh, NSURL objects, right? It's not that there is only one NSURL instance 
> on the entire system per volume item, shared between all processes, or is 
> there? The only caching, once I release an NSURL, is at the volume block 
> cache level, isn't it?
> Also, use -[enumeratorAtURL:includingPropertiesForKeys:options:errorHandler:] 
> instead of 
> -[contentsOfDirectoryAtURL:includingPropertiesForKeys:options:error:] unless 
> you really need an NSArray of NSURLs. If your code is just processing all of 
> the URLs and has no need to keep them after processing, there's no reason to 
> add them to an array (which takes time and adds to peak memory pressure).
> Thanks, that makes sense.
> -[enumeratorAtURL:includingPropertiesForKeys:options:errorHandler:] also 
> supports recursive enumeration (which stops at device boundaries -- you'll 
> see mount points but not their contents) so you don't have to do that 
> yourself.
> Is that based on fts_read? Because I found that this is much faster on local 
> volumes (not on network vols, though) than all other ways I've tried. And it 
> brings along the st_dev value without time penalty, unlike 
> contentsOfDirectoryAtURL.

It used to be based on heavily modified fts(3). I rewrote it for Mojave to 
improve the memory footprint. It uses getattrlistbulk()for everything except 
when ti sees a mount point, and then it calls getattrlist on the mount point 
path to get the attributes from the other file system's root directory.

- Jim

> Regardless, I'll give that a try.
> -- 
> Thomas Tempelmann, http://apps.tempel.org/ <http://apps.tempel.org/>
> Follow me on Twitter: https://twitter.com/tempelorg 
> <https://twitter.com/tempelorg>
> Read my programming blog: http://blog.tempel.org/ <http://blog.tempel.org/>
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list      (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:

This email sent to arch...@mail-archive.com

Reply via email to