On Tue, Aug 14, 2018 at 8:13 AM Jeff Hostetler <[email protected]> wrote:
>
>
>
> On 8/13/2018 2:14 PM, Matthew DeVore wrote:
> > Teach list-objects the "tree:0" filter which allows for filtering
> > out all tree and blob objects (unless other objects are explicitly
> > specified by the user). The purpose of this patch is to allow smaller
> > partial clones.
> >
> > The name of this filter - tree:0 - does not explicitly specify that
> > it also filters out all blobs, but this should not cause much confusion
> > because blobs are not at all useful without the trees that refer to
> > them.
> >
> > I also consider only:commits as a name, but this is inaccurate because
> > it suggests that annotated tags are omitted, but actually they are
> > included.
> >
> > The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
> > would filter out all but the root tree and blobs. In order to avoid
> > confusion between 0 and capital O, the documentation was worded in a
> > somewhat round-about way that also hints at this future improvement to
> > the feature.
> >
> > Signed-off-by: Matthew DeVore <[email protected]>
> > ---
> > Documentation/rev-list-options.txt | 3 ++
> > list-objects-filter-options.c | 4 +++
> > list-objects-filter-options.h | 1 +
> > list-objects-filter.c | 50 ++++++++++++++++++++++++++
> > t/t5317-pack-objects-filter-objects.sh | 27 ++++++++++++++
> > t/t5616-partial-clone.sh | 27 ++++++++++++++
> > t/t6112-rev-list-filters-objects.sh | 13 +++++++
> > 7 files changed, 125 insertions(+)
> >
> > diff --git a/Documentation/rev-list-options.txt
> > b/Documentation/rev-list-options.txt
> > index 7b273635d..9e351ec2a 100644
> > --- a/Documentation/rev-list-options.txt
> > +++ b/Documentation/rev-list-options.txt
> > @@ -743,6 +743,9 @@ specification contained in <path>.
> > A debug option to help with future "partial clone" development.
> > This option specifies how missing objects are handled.
> > +
> > +The form '--filter=tree:<depth>' omits all blobs and trees deeper than
> > +<depth> from the root tree. Currently, only <depth>=0 is supported.
> > ++
> > The form '--missing=error' requests that rev-list stop with an error if
> > a missing object is encountered. This is the default action.
> > +
> > diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
> > index c0e2bd6a0..a28382940 100644
> > --- a/list-objects-filter-options.c
> > +++ b/list-objects-filter-options.c
> > @@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter(
> > return 0;
> > }
> >
> > + } else if (!strcmp(arg, "tree:0")) {
> > + filter_options->choice = LOFC_TREE_NONE;
> > + return 0;
> > +
> > } else if (skip_prefix(arg, "sparse:oid=", &v0)) {
> > struct object_context oc;
> > struct object_id sparse_oid;
> > diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
> > index 0000a61f8..af64e5c66 100644
> > --- a/list-objects-filter-options.h
> > +++ b/list-objects-filter-options.h
> > @@ -10,6 +10,7 @@ enum list_objects_filter_choice {
> > LOFC_DISABLED = 0,
> > LOFC_BLOB_NONE,
> > LOFC_BLOB_LIMIT,
> > + LOFC_TREE_NONE,
> > LOFC_SPARSE_OID,
> > LOFC_SPARSE_PATH,
> > LOFC__COUNT /* must be last */
> > diff --git a/list-objects-filter.c b/list-objects-filter.c
> > index a0ba78b20..8e3caf5bf 100644
> > --- a/list-objects-filter.c
> > +++ b/list-objects-filter.c
> > @@ -80,6 +80,55 @@ static void *filter_blobs_none__init(
> > return d;
> > }
> >
> > +/*
> > + * A filter for list-objects to omit ALL trees and blobs from the
> > traversal.
> > + * Can OPTIONALLY collect a list of the omitted OIDs.
> > + */
> > +struct filter_trees_none_data {
> > + struct oidset *omits;
> > +};
> > +
> > +static enum list_objects_filter_result filter_trees_none(
> > + enum list_objects_filter_situation filter_situation,
> > + struct object *obj,
> > + const char *pathname,
> > + const char *filename,
> > + void *filter_data_)
> > +{
> > + struct filter_trees_none_data *filter_data = filter_data_;
> > +
> > + switch (filter_situation) {
> > + default:
> > + die("unknown filter_situation");
> > + return LOFR_ZERO;
> > +
> > + case LOFS_BEGIN_TREE:
> > + case LOFS_BLOB:
> > + if (filter_data->omits)
> > + oidset_insert(filter_data->omits, &obj->oid);
> > + return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
> > +
> > + case LOFS_END_TREE:
> > + assert(obj->type == OBJ_TREE);
> > + return LOFR_ZERO;
> > +
> > + }
> > +}
>
> There are a couple of options here:
> [] If really want to omit all trees and blobs (and we DO NOT want
> the oidset of everything omitted), then we might be able to
> shortcut the traversal and speed things up.
>
> {} add a LOFR_SKIP_TREE bit to list_objects_filter_result
> {} test this bit process_tree() and avoid the init_tree_desc() and
> the while loop and some adjacent setup/tear-down code.
> {} make this filter something like:
>
> case LOFS_BEGIN_TREE:
> if (filter_data->omits) {
> oidset_insert(filter_data->omits, &obj->oid);
> return LOFR_MARK_SEEN; /* ... (hard omit) */
> } else
> return LOFR_SKIP_TREE;
> case LOFS_BLOB:
> if (filter_data->omits) {
> oidset_insert(filter_data->omits, &obj->oid);
> return LOFR_MARK_SEEN; /* ... (hard omit) */
> else
> assert(...should not happen...);
I like this - it will considerably reduce the amount of work the
server needs to do on a partial clone. I'd prefer to do this in a
follow-up patchset, so I added a NEEDSWORK in the commit that adds
proper handling for filtered tree objects. I want to make sure the
unit tests are thorough when I apply your suggestion, and this
patchset is already a bit more complex than I was expecting.
>
> [] Later, if we choose to actually support a depth>0, we'll probably
> want a different filter function to conditionally include/exclude
> blobs, include shallow tree[node]s, and do some of the provisional-
> omit logic on deep tree[nodes] (in case a tree appears at multiple
> places/depths in the history). But that can wait.
>
> Jeff
>
>
> > +
> > +static void* filter_trees_none__init(
> > + struct oidset *omitted,
> > + struct list_objects_filter_options *filter_options,
> > + filter_object_fn *filter_fn,
> > + filter_free_fn *filter_free_fn)
> > +{
> > + struct filter_trees_none_data *d = xcalloc(1, sizeof(*d));
> > + d->omits = omitted;
> > +
> > + *filter_fn = filter_trees_none;
> > + *filter_free_fn = free;
> > + return d;
> > +}
> > +
> > /*
> > * A filter for list-objects to omit large blobs.
> > * And to OPTIONALLY collect a list of the omitted OIDs.
> > @@ -374,6 +423,7 @@ static filter_init_fn s_filters[] = {
> > NULL,
> > filter_blobs_none__init,
> > filter_blobs_limit__init,
> > + filter_trees_none__init,
> > filter_sparse_oid__init,
> > filter_sparse_path__init,
> > };
> > diff --git a/t/t5317-pack-objects-filter-objects.sh
> > b/t/t5317-pack-objects-filter-objects.sh
> > index 5e35f33bf..65f2cf446 100755
> > --- a/t/t5317-pack-objects-filter-objects.sh
> > +++ b/t/t5317-pack-objects-filter-objects.sh
> > @@ -72,6 +72,33 @@ test_expect_success 'get an error for missing tree
> > object' '
> > grep -q "bad tree object" bad_tree
> > '
> >
> > +test_expect_success 'setup for tests of tree:0' '
> > + mkdir r1/subtree &&
> > + echo "This is a file in a subtree" > r1/subtree/file &&
> > + git -C r1 add subtree/file &&
> > + git -C r1 commit -m subtree
> > +'
> > +
> > +test_expect_success 'verify tree:0 packfile has no blobs or trees' '
> > + git -C r1 pack-objects --rev --stdout --filter=tree:0
> > >commitsonly.pack <<-EOF &&
> > + HEAD
> > + EOF
> > + git -C r1 index-pack ../commitsonly.pack &&
> > + git -C r1 verify-pack -v ../commitsonly.pack >objs &&
> > + ! grep -E "tree|blob" objs
> > +'
> > +
> > +test_expect_success 'grab tree directly when using tree:0' '
> > + # We should get the tree specified directly but not its blobs or
> > subtrees.
> > + git -C r1 pack-objects --rev --stdout --filter=tree:0
> > >commitsonly.pack <<-EOF &&
> > + HEAD:
> > + EOF
> > + git -C r1 index-pack ../commitsonly.pack &&
> > + git -C r1 verify-pack -v ../commitsonly.pack >objs &&
> > + grep -E "tree|blob" objs >trees_and_blobs &&
> > + test_line_count = 1 trees_and_blobs
> > +'
> > +
> > # Test blob:limit=<n>[kmg] filter.
> > # We boundary test around the size parameter. The filter is strictly
> > less than
> > # the value, so size 500 and 1000 should have the same results, but 1001
> > should
> > diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> > index bbbe7537d..fc4d182c0 100755
> > --- a/t/t5616-partial-clone.sh
> > +++ b/t/t5616-partial-clone.sh
> > @@ -170,6 +170,33 @@ test_expect_success 'partial clone fetches blobs
> > pointed to by refs even if norm
> > git -C dst fsck
> > '
> >
> > +test_expect_success 'can use tree:0 to filter partial clone' '
> > + rm -rf dst &&
> > + git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst
> > &&
> > + git -C dst rev-list master --missing=allow-any --objects
> > >fetched_objects &&
> > + cat fetched_objects \
> > + | awk -f print_1.awk \
> > + | xargs -n1 git -C dst cat-file -t >fetched_types &&
> > + sort fetched_types -u >unique_types.observed &&
> > + echo commit > unique_types.expected &&
> > + test_cmp unique_types.observed unique_types.expected
> > +'
> > +
> > +test_expect_success 'show missing tree objects with --missing=print' '
> > + git -C dst rev-list master --missing=print --quiet --objects
> > >missing_objs &&
> > + sed "s/?//" missing_objs \
> > + | xargs -n1 git -C srv.bare cat-file -t \
> > + >missing_types &&
> > + sort -u missing_types >missing_types.uniq &&
> > + echo tree >expected &&
> > + test_cmp missing_types.uniq expected
> > +'
> > +
> > +test_expect_success 'do not complain when a missing tree cannot be parsed'
> > '
> > + git -C dst rev-list master --missing=print --quiet --objects
> > 2>rev_list_err >&2 &&
> > + ! grep -q "Could not read " rev_list_err
> > +'
> > +
> > . "$TEST_DIRECTORY"/lib-httpd.sh
> > start_httpd
> >
> > diff --git a/t/t6112-rev-list-filters-objects.sh
> > b/t/t6112-rev-list-filters-objects.sh
> > index 0a37dd5f9..6ccffddbc 100755
> > --- a/t/t6112-rev-list-filters-objects.sh
> > +++ b/t/t6112-rev-list-filters-objects.sh
> > @@ -196,6 +196,19 @@ test_expect_success 'verify sparse:oid=oid-ish omits
> > top-level files' '
> > test_cmp observed expected
> > '
> >
> > +# Test tree:0 filter.
> > +
> > +test_expect_success 'verify tree:0 includes trees in "filtered" output' '
> > + git -C r3 rev-list HEAD --quiet --objects --filter-print-omitted
> > --filter=tree:0 \
> > + | awk -f print_1.awk \
> > + | sed s/~// \
> > + | xargs -n1 git -C r3 cat-file -t \
> > + | sort -u >filtered_types &&
> > + printf "blob\ntree\n" > expected &&
> > + test_cmp filtered_types expected
> > +'
> > +
> > +
> > # Delete some loose objects and use rev-list, but WITHOUT any filtering.
> > # This models previously omitted objects that we did not receive.
> >
> >