On 09/27, Junio C Hamano wrote:
> Brandon Williams <bmw...@google.com> writes:
> 
> > -   /* Find common prefix for all pathspec's */
> > -   max_prefix = common_prefix(&pathspec);
> > +   /*
> > +    * Find common prefix for all pathspec's
> > +    * This is used as a performance optimization which unfortunately cannot
> > +    * be done when recursing into submodules
> > +    */
> > +   if (recurse_submodules)
> > +           max_prefix = NULL;
> > +   else
> > +           max_prefix = common_prefix(&pathspec);
> >     max_prefix_len = max_prefix ? strlen(max_prefix) : 0;
> 
> I still wonder if we can do better than this, as this would be a big
> cycle-saver especially in recurse-submodules case.
> 
> When you get max_prefix that is "a/b/c", there are three cases:
> 
>  * a/b/c is a path prefix for an entry in the index, e.g. a/b/c/d;
>    you then can safely use it and you do not have to do any
>    recursive invocation of ls-files outside "a/b/c".  You may match
>    a/b/c/d in the toplevel, or you may recurse a/b/c/e that is a
>    submodule, but you won't have to pay attention to submodules
>    outside.
> 
>  * a leading path of a/b/c, e.g. a/b, is a gitlink or a blob in the
>    index; you can use a/b and you only have to recurse into a/b if
>    that is a submodule; if a/b is a blob, you'd show nothing.
> 
>  * a/b/c itself and no leading path of it appears in the index; you
>    know that nothing will match once you know that you are in this
>    situation.
> 
> Because a gitlink "a/b" sorts at the same location in the index as a
> regular blob "a/b" would, by feeding the max_prefix common_prefix()
> gives you (i.e. "a/b/c") to index_name_pos() to see which one of the
> three situations you are in can be done fairly cheaply, I would
> think.  The index_name_pos() call may find "a/b/c" exactly (case 1),
> or return a location where "a/b/c" would be inserted in the list of
> existing entries.  If there were "a/b" (or "a") in the index, there
> wouldn't be any "a/b/x" (or "a/x") at the same time, so a query for
> "a/b/c" would land you next to (just after) an existing entry that
> is a leading path of it, if such an entry exists, no?  That would
> allow you to tell case 2 above fairly cheaply, I would expect.
> 
> It is a separate issue if adding that support to 4/4 is a good idea;
> I personally think doing it as a separate follow-up patch would make
> more sense, so all of the above is tangent.

I agree that this could be a big cycle saver. At the present I was
working towards getting a working implementation but this should
definitely be addressed in a follow-up patch to introduce the
optimization to the recurse-submodule mode.  It hopefully wouldn't be
too hard to implement seeing as its using string literals.

-- 
Brandon Williams

Reply via email to