On 11 May 2018 at 23:15, Derrick Stolee <dsto...@microsoft.com> wrote:

I finally sat down today and familiarized myself with the commit-graph
code a little. My biggest surprise was when I noticed that there is a
hash checksum at the end of the commit-graph-file. That in combination
with the tests where you flip some bytes...

It turns out, if my reading is right, that the hash value is written as
the commit-graph is generated, but that it is not verified as the
commit-graph is later read. I could not find any mention of your plans
here -- I understand why you would not want to verify the hash in
`load_commit_graph_one()`, at least not in every run. Anyway, this is
just an observation. Verifying the hash would affect the tests this
series adds. They might need to rewrite the hash or set some magic
environment variable. :-/ But that's for another day.

> +       for (i = 0; i < g->num_commits; i++) {
> +               struct commit *graph_commit, *odb_commit;
> +               struct commit_list *graph_parents, *odb_parents;
> +               int num_parents = 0;
> +               hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);

`num_commits` was derived as the commit-graph was loaded. It was derived
from offsets which were verified to be in the mmap-ed memory. So this
source address is guaranteed to be so, as well. Ok.

(Once brian's latest series hits master, this could use `oidread(...)`.)

> +               graph_commit = lookup_commit(&cur_oid);

Do we know this comes from the graph? Even with a more-or-less-messed-up
commit graph? See below.

> +               odb_commit = (struct commit *)create_object(cur_oid.hash, 
> alloc_commit_node());
> +               if (parse_commit_internal(odb_commit, 0, 0)) {
> +                       graph_report("failed to parse %s from object 
> database", oid_to_hex(&cur_oid));
> +                       continue;
> +               }
> +
> +               if (oidcmp(&get_commit_tree_in_graph_one(g, 
> graph_commit)->object.oid,
> +                          get_commit_tree_oid(odb_commit)))

`get_commit_tree_in_graph_one()` will BUG rather than return NULL. So
this will not dereference NULL. Good. But might it hit the BUG? That is,
can we trust the commit coming out of `lookup_commit()` not to have
`graph_pos == COMMIT_NOT_FROM_GRAPH`?

> +                       graph_report("root tree object ID for commit %s in 
> commit-graph is %s != %s",
> +                                    oid_to_hex(&cur_oid),
> +                                    
> oid_to_hex(get_commit_tree_oid(graph_commit)),
> +                                    
> oid_to_hex(get_commit_tree_oid(odb_commit)));
> +
> +               if (graph_commit->date != odb_commit->date)
> +                       graph_report("commit date for commit %s in 
> commit-graph is %"PRItime" != %"PRItime"",
> +                                    oid_to_hex(&cur_oid),
> +                                    graph_commit->date,
> +                                    odb_commit->date);
> +
> +

(Extra blank line?)

> +               graph_parents = graph_commit->parents;
> +               odb_parents = odb_commit->parents;
> +
> +               while (graph_parents) {
> +                       num_parents++;
> +
> +                       if (odb_parents == NULL)
> +                               graph_report("commit-graph parent list for 
> commit %s is too long (%d)",
> +                                            oid_to_hex(&cur_oid),
> +                                            num_parents);
> +
> +                       if (oidcmp(&graph_parents->item->object.oid, 
> &odb_parents->item->object.oid))
> +                               graph_report("commit-graph parent for %s is 
> %s != %s",
> +                                            oid_to_hex(&cur_oid),
> +                                            
> oid_to_hex(&graph_parents->item->object.oid),
> +                                            
> oid_to_hex(&odb_parents->item->object.oid));
> +
> +                       graph_parents = graph_parents->next;
> +                       odb_parents = odb_parents->next;
> +               }
> +
> +               if (odb_parents != NULL)
> +                       graph_report("commit-graph parent list for commit %s 
> terminates early",
> +                                    oid_to_hex(&cur_oid));

Ok, ensure the lists are equally long and compare the entries.

> +
> +               if (graph_commit->generation) {

If the commit has a generation number (not an old commit graph)...

> +                       uint32_t max_generation = 0;
> +                       graph_parents = graph_commit->parents;
> +
> +                       while (graph_parents) {
> +                               if (graph_parents->item->generation == 
> GENERATION_NUMBER_ZERO ||
> +                                   graph_parents->item->generation == 
> GENERATION_NUMBER_INFINITY)
> +                                       graph_report("commit-graph has valid 
> generation for %s but not its parent, %s",
> +                                                    oid_to_hex(&cur_oid),
> +                                                    
> oid_to_hex(&graph_parents->item->object.oid));

... then it's odd if a parent has no generation number or is not in the
graph file at all.

> +                               if (graph_parents->item->generation > 
> max_generation)
> +                                       max_generation = 
> graph_parents->item->generation;
> +                               graph_parents = graph_parents->next;
> +                       }
> +
> +                       if (max_generation == GENERATION_NUMBER_MAX)
> +                               max_generation--;
> +
> +                       if (graph_commit->generation != max_generation + 1)
> +                               graph_report("commit-graph has incorrect 
> generation for %s",
> +                                            oid_to_hex(&cur_oid));

Ok. Thanks for considering adding a comment.

> +               } else {
> +                       graph_parents = graph_commit->parents;
> +
> +                       while (graph_parents) {
> +                               if (graph_parents->item->generation)
> +                                       graph_report("commit-graph has 
> generation ZERO for %s but not its parent, %s",
> +                                                    oid_to_hex(&cur_oid),
> +                                                    
> oid_to_hex(&graph_parents->item->object.oid));
> +                               graph_parents = graph_parents->next;
> +                       }

Right.

> +               }
> +       }
> +
>         return verify_commit_graph_error;
>  }

Nice. My question about `get_commit_tree_in_graph_one()` might be me
making things much harder than they are. I sort of suspect that your
usage will be quite obviously safe in retrospect.

Martin

Reply via email to