On 4/18/2018 8:02 PM, Jakub Narebski wrote:
Derrick Stolee <dsto...@microsoft.com> writes:

Most code paths load commits using lookup_commit() and then
parse_commit(). In some cases, including some branch lookups, the commit
is parsed using parse_object_buffer() which side-steps parse_commit() in
favor of parse_commit_buffer().

With generation numbers in the commit-graph, we need to ensure that any
commit that exists in the commit-graph file has its generation number
loaded.
All right, that is nice explanation of the why behind this change.

Create new load_commit_graph_info() method to fill in the information
for a commit that exists only in the commit-graph file. Call it from
parse_commit_buffer() after loading the other commit information from
the given buffer. Only fill this information when specified by the
'check_graph' parameter. This avoids duplicate work when we already
checked the graph in parse_commit_gently() or when simply checking the
buffer contents in check_commit().
Couldn't this 'check_graph' parameter be a global variable similar to
the 'commit_graph' variable?  Maybe I am not understanding it.

See the two callers at the bottom of the patch. They have different purposes: one needs to fill in a valid commit struct, the other needs to check the commit buffer is valid (then throws away the struct). They have different values for 'check_graph'. Also, in parse_commit_gently() we check parse_commit_in_graph() before we call parse_commit_buffer, so we do not want to repeat work; in the case of a valid commit-graph file, but the commit is not in the commit-graph, we would repeat our binary search for the same commit.


Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
  commit-graph.c | 51 ++++++++++++++++++++++++++++++++------------------
  commit-graph.h |  8 ++++++++
  commit.c       |  7 +++++--
  commit.h       |  2 +-
  object.c       |  2 +-
  sha1_file.c    |  2 +-
  6 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 688d5b1801..21e853c21a 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -245,13 +245,19 @@ static struct commit_list **insert_parent_or_die(struct 
commit_graph *g,
        return &commit_list_insert(c, pptr)->next;
  }
+static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, uint32_t pos)
+{
+       const unsigned char *commit_data = g->chunk_commit_data + 
GRAPH_DATA_WIDTH * pos;
+       item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
+}
+
  static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, 
uint32_t pos)
  {
        uint32_t edge_value;
        uint32_t *parent_data_ptr;
        uint64_t date_low, date_high;
        struct commit_list **pptr;
-       const unsigned char *commit_data = g->chunk_commit_data + (g->hash_len 
+ 16) * pos;
+       const unsigned char *commit_data = g->chunk_commit_data + 
GRAPH_DATA_WIDTH * pos;
I'm probably wrong, but isn't it unrelated change?

You're right. I saw this while I was in here, and there was a similar comment on this change in a different patch. Probably best to keep these cleanup things in a separate commit.

        item->object.parsed = 1;
        item->graph_pos = pos;
@@ -292,31 +298,40 @@ static int fill_commit_in_graph(struct commit *item, 
struct commit_graph *g, uin
        return 1;
  }
+static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t *pos)
+{
+       if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) {
+               *pos = item->graph_pos;
+               return 1;
+       } else {
+               return bsearch_graph(commit_graph, &(item->object.oid), pos);
+       }
+}
All right (after the fix).

+
  int parse_commit_in_graph(struct commit *item)
  {
+       uint32_t pos;
+
+       if (item->object.parsed)
+               return 0;
        if (!core_commit_graph)
                return 0;
-       if (item->object.parsed)
-               return 1;
Hmmm... previously the function returned 1 if item->object.parsed, now
it returns 0 for this situation.  I don't understand this change.

The good news is that this change is unimportant (the only caller is parse_commit_gently() which checks item->object.parsed before calling parse_commit_in_graph()). I wonder why I reordered those things, anyway. I'll revert to simplify the patch.


-
        prepare_commit_graph();
-       if (commit_graph) {
-               uint32_t pos;
-               int found;
-               if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) {
-                       pos = item->graph_pos;
-                       found = 1;
-               } else {
-                       found = bsearch_graph(commit_graph, &(item->object.oid), 
&pos);
-               }
-
-               if (found)
-                       return fill_commit_in_graph(item, commit_graph, pos);
-       }
-
+       if (commit_graph && find_commit_in_graph(item, commit_graph, &pos))
+               return fill_commit_in_graph(item, commit_graph, pos);
Nice refactoring.

        return 0;
  }
+void load_commit_graph_info(struct commit *item)
+{
+       uint32_t pos;
+       if (!core_commit_graph)
+               return;
+       prepare_commit_graph();
+       if (commit_graph && find_commit_in_graph(item, commit_graph, &pos))
+               fill_commit_graph_info(item, commit_graph, pos);
+}
And the reason for the refactoring.

+
  static struct tree *load_tree_for_commit(struct commit_graph *g, struct 
commit *c)
  {
        struct object_id oid;
diff --git a/commit-graph.h b/commit-graph.h
index 260a468e73..96cccb10f3 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -17,6 +17,14 @@ char *get_commit_graph_filename(const char *obj_dir);
   */
  int parse_commit_in_graph(struct commit *item);
+/*
+ * It is possible that we loaded commit contents from the commit buffer,
+ * but we also want to ensure the commit-graph content is correctly
+ * checked and filled. Fill the graph_pos and generation members of
+ * the given commit.
+ */
+void load_commit_graph_info(struct commit *item);
+
  struct tree *get_commit_tree_in_graph(const struct commit *c);
struct commit_graph {
diff --git a/commit.c b/commit.c
index a70f120878..9ef6f699bd 100644
--- a/commit.c
+++ b/commit.c
@@ -331,7 +331,7 @@ const void *detach_commit_buffer(struct commit *commit, 
unsigned long *sizep)
        return ret;
  }
-int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size)
+int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long 
size, int check_graph)
  {
        const char *tail = buffer;
        const char *bufptr = buffer;
@@ -386,6 +386,9 @@ int parse_commit_buffer(struct commit *item, const void 
*buffer, unsigned long s
        }
        item->date = parse_commit_date(bufptr, tail);
+ if (check_graph)
+               load_commit_graph_info(item);
+
        return 0;
  }
@@ -412,7 +415,7 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing)
                return error("Object %s not a commit",
                             oid_to_hex(&item->object.oid));
        }
-       ret = parse_commit_buffer(item, buffer, size);
+       ret = parse_commit_buffer(item, buffer, size, 0);
        if (save_commit_buffer && !ret) {
                set_commit_buffer(item, buffer, size);
                return 0;
diff --git a/commit.h b/commit.h
index 64436ff44e..b5afde1ae9 100644
--- a/commit.h
+++ b/commit.h
@@ -72,7 +72,7 @@ struct commit *lookup_commit_reference_by_name(const char 
*name);
   */
  struct commit *lookup_commit_or_die(const struct object_id *oid, const char 
*ref_name);
-int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size);
+int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long 
size, int check_graph);
  int parse_commit_gently(struct commit *item, int quiet_on_missing);
  static inline int parse_commit(struct commit *item)
  {
diff --git a/object.c b/object.c
index e6ad3f61f0..efe4871325 100644
--- a/object.c
+++ b/object.c
@@ -207,7 +207,7 @@ struct object *parse_object_buffer(const struct object_id 
*oid, enum object_type
        } else if (type == OBJ_COMMIT) {
                struct commit *commit = lookup_commit(oid);
                if (commit) {
-                       if (parse_commit_buffer(commit, buffer, size))
+                       if (parse_commit_buffer(commit, buffer, size, 1))
                                return NULL;
                        if (!get_cached_commit_buffer(commit, NULL)) {
                                set_commit_buffer(commit, buffer, size);
diff --git a/sha1_file.c b/sha1_file.c
index 1b94f39c4c..0fd4f0b8b6 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1755,7 +1755,7 @@ static void check_commit(const void *buf, size_t size)
  {
        struct commit c;
        memset(&c, 0, sizeof(c));
-       if (parse_commit_buffer(&c, buf, size))
+       if (parse_commit_buffer(&c, buf, size, 0))
                die("corrupt commit");
  }

Reply via email to