On 4/18/2018 5:02 PM, Jakub Narebski wrote:
Here I can offer only the cursory examination, as I don't know this area
of code in question.

Derrick Stolee <dsto...@microsoft.com> writes:

A commit A can reach a commit B only if the generation number of A
is larger than the generation number of B. This condition allows
significantly short-circuiting commit-graph walks.

Use generation number for 'git tag --contains' queries.

On a copy of the Linux repository where HEAD is containd in v4.13
but no earlier tag, the command 'git tag --contains HEAD' had the
following peformance improvement:

Before: 0.81s
After:  0.04s
Rel %:  -95%
A question: what is the performance after if the "commit-graph" feature
is disabled, or there is no commit-graph file?  Is there performance
regression in this case, or is the difference negligible?

Negligible, since we are adding a small number of integer comparisons and the main cost is in commit parsing. More on commit parsing in response to your comments below.


Helped-by: Jeff King <p...@peff.net>
Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
  ref-filter.c | 23 +++++++++++++++++++----
  1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index cffd8bf3ce..e2fea6d635 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1587,7 +1587,8 @@ static int in_commit_list(const struct commit_list *want, 
struct commit *c)
  /*
   * Test whether the candidate or one of its parents is contained in the list.
                                  ^^^^^^^^^^^^^^^^^^^^^

Sidenote: when examining the code after the change, I have noticed that
the above part of commit header for the comtains_test() function is no
longer entirely correct, as the function only checks the candidate
commit, and in no place it access its parents.

But that is not your problem.

I'll add a commit in the next version that fixes this comment before I make any changes to the method.


   * Do not recurse to find out, though, but return -1 if inconclusive.
   */
  static enum contains_result contains_test(struct commit *candidate,
                                          const struct commit_list *want,
-                                         struct contains_cache *cache)
+                                         struct contains_cache *cache,
+                                         uint32_t cutoff)
  {
        enum contains_result *cached = contains_cache_at(cache, candidate);
@@ -1603,6 +1604,10 @@ static enum contains_result contains_test(struct commit *candidate, /* Otherwise, we don't know; prepare to recurse */
        parse_commit_or_die(candidate);
+
+       if (candidate->generation < cutoff)
+               return CONTAINS_NO;
+
Looks good to me.

The only [minor] question may be whether to define separate type for
generation numbers, and whether to future proof the tests - though the
latter would be almost certainly overengineering, and the former
probablt too.

If we have multiple notions of generation, then we can refactor all references to the "generation" member.


        return CONTAINS_UNKNOWN;
  }
@@ -1618,8 +1623,18 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
                                              struct contains_cache *cache)
  {
        struct contains_stack contains_stack = { 0, 0, NULL };
-       enum contains_result result = contains_test(candidate, want, cache);
+       enum contains_result result;
+       uint32_t cutoff = GENERATION_NUMBER_INFINITY;
+       const struct commit_list *p;
+
+       for (p = want; p; p = p->next) {
+               struct commit *c = p->item;
+               parse_commit_or_die(c);
+               if (c->generation < cutoff)
+                       cutoff = c->generation;
+       }
Sholdn't the above be made conditional on the ability to get generation
numbers from the commit-graph file (feature is turned on and file
exists)?  Otherwise here after the change contains_tag_algo() now parses
each commit in 'want', which I think was not done previously.

With commit-graph file parsing is [probably] cheap.  Without it, not
necessary.

But I might be worrying about nothing.

Not nothing. This parses the "wants" when we previously did not parse the wants. Further: this parsing happens before we do the simple check of comparing the OID of the candidate against the wants.

The question is: are these parsed commits significant compared to the walk that will parse many more commits? It is certainly possible.

One way to fix this is to call 'prepare_commit_graph()' directly and then test that 'commit_graph' is non-null before performing any parses. I'm not thrilled with how that couples the commit-graph implementation to this feature, but that may be necessary to avoid regressions in the non-commit-graph case.


+ result = contains_test(candidate, want, cache, cutoff);
Other than the question about possible performace regression if
commit-graph data is not available, it looks good to me.

        if (result != CONTAINS_UNKNOWN)
                return result;
@@ -1637,7 +1652,7 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
                 * If we just popped the stack, parents->item has been marked,
                 * therefore contains_test will return a meaningful yes/no.
                 */
-               else switch (contains_test(parents->item, want, cache)) {
+               else switch (contains_test(parents->item, want, cache, cutoff)) 
{
                case CONTAINS_YES:
                        *contains_cache_at(cache, commit) = CONTAINS_YES;
                        contains_stack.nr--;
@@ -1651,7 +1666,7 @@ static enum contains_result contains_tag_algo(struct 
commit *candidate,
                }
        }
        free(contains_stack.contains_stack);
-       return contains_test(candidate, want, cache);
+       return contains_test(candidate, want, cache, cutoff);
Simple change. It looks good to me.

  }
static int commit_contains(struct ref_filter *filter, struct commit *commit,

Reply via email to