On Sat, Apr 21, 2018 at 9:37 PM, Elijah Newren <new...@gmail.com> wrote:
> Currently, all callers of unpack_trees() set o->src_index == o->dst_index.
> Since we create a temporary index in o->result, then discard o->dst_index
> and overwrite it with o->result, when o->src_index == o->dst_index it is
> safe to just reuse o->src_index's split_index for o->result.  However,
> o->src_index and o->dst_index are specified separately in order to allow
> callers to have these be different.  In such a case, reusing
> o->src_index's split_index for o->result will cause the split_index to be
> shared.  If either index then has entries replaced or removed, it will
> result in the other index referring to free()'d memory.
>
> Signed-off-by: Elijah Newren <new...@gmail.com>
> ---
>
> I still haven't wrapped my head around the split_index stuff entirely, so
> it's possible that
>
>   - the performance optimization isn't even valid when src == dst.  Could
>     the original index be different enough from the result that we don't
>     want its split_index?

This really depends on the use case of course. But when git checkout
is used for switching branches, unpack-trees will be used and unless
you switch between to vastly different branches, the updated entries
may be small compared to the entire index that sharing is still good.
If the result index is so different that it results in a huge index
file anyway, I believe we have code to recreate a new shared index to
keep its size down next time.

>   - there's a better, more performant fix or there is some way to actually
>     share a split_index between two independent index_state objects.

A cleaner way of doing this would be something to the line [1]

    move_index_extensions(&o->result, o->dst_index);

near the end of this function. This could be where we compare the
result index with the source index's shared file and see if it's worth
keeping the shared index or not. Shared index is designed to work with
huge index files though, any operations that go through all index
entries will usually not be cheap. But at least it's safer.

> However, with this fix, all the tests pass both normally and under
> GIT_TEST_SPLIT_INDEX=DareISayYes.  Without this patch, when
> GIT_TEST_SPLIT_INDEX is set, my directory rename detection series will fail
> several tests, as reported by SZEDER.

Yes, the change looks good.

[1] To me the second parameter should be src_index, not dst_index.
We're copying entries from _source_ index to "result" and we should
also copy extensions from the source index. That line happens to work
only when dst_index is the same as src_index, which is the common use
case so far.

>  unpack-trees.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 79fd97074e..b670415d4c 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1284,9 +1284,20 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
> struct unpack_trees_options
>         o->result.timestamp.sec = o->src_index->timestamp.sec;
>         o->result.timestamp.nsec = o->src_index->timestamp.nsec;
>         o->result.version = o->src_index->version;
> -       o->result.split_index = o->src_index->split_index;
> -       if (o->result.split_index)
> +       if (!o->src_index->split_index) {
> +               o->result.split_index = NULL;
> +       } else if (o->src_index == o->dst_index) {
> +               /*
> +                * o->dst_index (and thus o->src_index) will be discarded
> +                * and overwritten with o->result at the end of this function,
> +                * so just use src_index's split_index to avoid having to
> +                * create a new one.
> +                */
> +               o->result.split_index = o->src_index->split_index;
>                 o->result.split_index->refcount++;
> +       } else {
> +               o->result.split_index = init_split_index(&o->result);
> +       }
>         hashcpy(o->result.sha1, o->src_index->sha1);
>         o->merge_size = len;
>         mark_all_ce_unused(o->src_index);
> --
> 2.17.0.296.gaac25b4b81
>



-- 
Duy

Reply via email to