On 8/22/2019 9:10 AM, Derrick Stolee wrote:
> On 8/21/2019 5:52 PM, Elijah Newren wrote:
>> On Tue, Aug 20, 2019 at 8:12 AM Derrick Stolee via GitGitGadget
>>> For example, if we add A/B/C as a recursive folder, then we add the
>>> following patterns to the sparse-checkout file:
>>>
>>> /*
>>> !/*/*
>>> /A/*
>>> !/A/*/*
>>> /A/B/*
>>> !/A/B/*/*
>>> /A/B/C/*
>>>
>>> The alternating positive/negative patterns say "include everything in this
>>> folder, but exclude everything another level deeper". The final pattern has
>>> no matching negation, so is a recursively closed pattern.
>>
>> Oh, um, would there be any option for fast but without grabbing
>> sibling and parent files of requested directories?  And could users
>> still request individual files (not with regex or pathspec, but fully
>> specifying the path) and still get the fast mode?
> 
> Exact files could probably be included and still be fast. It requires an
> extra hash check per entry, but that's a small price to pay I think.

Quick side note on this point about exact file names and the REAL reason
for the parent paths that I had forgotten until just now.

The following comment exists in unpack-trees.c, clear_ce_flags_dir():

        /*
         * TODO: check el, if there are no patterns that may conflict
         * with ret (iow, we know in advance the incl/excl
         * decision for the entire directory), clear flag here without
         * calling clear_ce_flags_1(). That function will call
         * the expensive is_excluded_from_list() on every entry.
         */

While I haven't implemented it yet in this RFC, this TODO can actually
happen with the current set of cone patterns:

1. If we hit a directory that is not in a parent or recursive path,
   then all paths it contains must have their skipworktree bits set.
   We can avoid computing hashes for them.

2. If we hit a directory that is in a recursive path, then all paths
   it contains must have skipworktree bits off. We can avoid computing
   hashes for them.

When we have a million index entries, these hash computations are not
insignificant!

With that in mind, I think there is a _performance_ reason to include
the parent folders in addition to _user experience_ reason. If we
were to add the complexity of exact file matches, then we would also
want to add the parent folders leading to that file so we can still
do the logic above.

Thanks,
-Stolee

Reply via email to