On Fri, Jun 21, 2019 at 05:26:26PM -0700, Jonathan Tan wrote:
> > Allow combining filters such that only objects accepted by all filters
> > are shown. The motivation for this is to allow getting directory
> > listings without also fetching blobs. This can be done by combining
> > blob:none with tree:<depth>. There are massive repositories that have
> > larger-than-expected trees - even if you include only a single commit.
>
> First of all, patches 2 and 3 are straightforward and LGTM. On to patch
> 4...
>
> [snip]
>
> > The current usage requires passing the filter to rev-list in the
> > following form:
> >
> > --filter=<FILTER1> --filter=<FILTER2> ...
> >
> > Such usage is currently an error, so giving it a meaning is backwards-
> > compatible.
> >
> > The URL-encoding scheme is being introduced before the repeated flag
> > logic, and the user-facing documentation for URL-encoding is being
> > withheld until the repeated flag feature is implemented. The
> > URL-encoding is in general not meant to be used directly by the user,
> > and it is better to describe the URL-encoding feature in terms of the
> > repeated flag.
>
> As of this commit, we don't support such arguments passed to rev-list in
> this way, so I would write these paragraphs as:
>
> A combined filter supports any number of subfilters, and is written in
> the following form:
>
> combine:<filter 1>+<filter 2>+<filter 3>
>
> Certain non-alphanumeric characters in each filter must be
> URL-encoded.
>
> For now, combined filters must be specified in this form. In a
> subsequent commit, rev-list will support multiple --filter arguments
> which will have the same effect as specifying one filter argument
> starting with "combine:".
Done, but I've amended your last paragraph to include the excuse about the
missing documentation:
For now, combined filters must be specified in this form. In a
subsequent commit, rev-list will support multiple --filter arguments
which will have the same effect as specifying one filter argument
starting with "combine:". The documentation will be updated in that
commit, as the URL-encoding scheme is in general not meant to be used
directly by the user, and it is better to describe the URL-encoding
feature in terms of the repeated flag.
> > +
> > + /* LOFC_COMBINE values */
> > +
> > + /* This array contains all the subfilters which this filter combines. */
> > + size_t sub_nr, sub_alloc;
> > + struct list_objects_filter_options *sub;
> > +
> > + /*
> > + * END choice-specific parsed values.
> > + */
> > };
>
> I still think it's cleaner to just have a "left subfilter" and "right
> subfilter", but I don't feel strongly about it. In any case, this is an
> internal detail and can always be changed in the future.
>
Interesting. I think there are two reasonable ways of thinking about it:
a. we are parsing a nested left-associative binary expression, and it's
helpful if the internal representation matches that semantic. (Maybe
this is what you're thinking?)
b. linked-lists are somewhat aberrational and should only be used in a
very narrow class of code. Arrays are more universal and therefore more
readable.
I'll keep it as-is for now since I don't see a great reason to revert it to the
previous style. Thank you for pointing this out - I *did* suspect there was a
big subjective factor to people's preference for the array form.
> > + /*
> > + * Optional. If this function is supplied and the filter needs to
> > + * collect omits, then this function is called once before free_fn is
> > + * called.
> > + */
> > + void (*finalize_omits_fn)(struct oidset *omits, void *filter_data);
>
> This is needed because a combined filter's omits actually lie in the
> subfilters. Resolving it this way means that callers must call
> list_objects_filter__free() before using the omits set. Can you add
> documentation to __init() (which is the first function to take in the
> omits set) and __free() describing this?
>
> (As stated in the test below, we cannot just share one omits set amongst
> all the subfilters - see filter_trees_update_omits and the call site
> that relies on its return value.)
>
I documented __init as follows:
/*
* Constructor for the set of defined list-objects filters.
* The `omitted` set is optional. It is populated with objects that the
* filter excludes. This set should not be considered finalized until
* after list_objects_filter__free is called on the returned `struct
* filter *`.
*/
And __free:
/*
* Destroys `filter` and finalizes the `omitted` set, if present. Does
* nothing if `filter` is null.
*/
And finalize_omits_fn (which has the internal specifics):
/*
* Optional. If this function is supplied and the filter needs
* to collect omits, then this function is called once before
* free_fn is called.
*
* This is required because the following two conditions hold:
*
* a. A tree filter can add and remove objects as an object
* graph is traversed.
* b. A combine filter's omit set is the union of all its
* subfilters, which may include tree: filters.
*
* As such, the omits sets must be separate sets, and can only
* be unioned after the traversal is completed.
*/
> Here comes the tricky part...
>
> > +static int should_delegate(enum list_objects_filter_situation
> > filter_situation,
> > + struct object *obj,
> > + struct subfilter *sub)
> > +{
> > + if (!sub->is_skipping_tree)
> > + return 1;
> > + if (filter_situation == LOFS_END_TREE &&
> > + oideq(&obj->oid, &sub->skip_tree)) {
> > + sub->is_skipping_tree = 0;
> > + return 1;
> > + }
> > + return 0;
> > +}
>
> Optional: I think this should be called "test_and_set_skip_tree" or
> something like that, made to return the inverse of its current return
> value, and documented:
>
> Returns the value of sub->is_skipping_tree at the moment of
> invocation. If iteration is at the LOFS_END_TREE of the tree currently
> being skipped, first clears sub->is_skipping_tree before returning.
>
That is not totally accurate, since in the second if block
(LOFS_END_TREE) we would return 0 even though is_skipping_tree starts
out as 1.
Since it's not equal to is_skipping_tree before invocation, I don't
think there is a better name to give it ATM.
So in the interest of removing "trickiness" I've inlined the function
(since there is no real way to express what it's doing in a function
name anyway) and I think it's more readable. This is the new callsite:
/*
* Check and update is_skipping_tree before oidset_contains so
* that is_skipping_tree gets unset even when the object is
* marked as seen. As of this writing, no filter uses
* LOFR_MARK_SEEN on trees that also uses LOFR_SKIP_TREE, so the
* ordering is only theoretically important. Be cautious if you
* change the order of the below checks and more filters have
* been added!
*/
if (sub->is_skipping_tree) {
if (filter_situation == LOFS_END_TREE &&
oideq(&obj->oid, &sub->skip_tree))
sub->is_skipping_tree = 0;
else
return LOFR_ZERO;
}
Hopefully that's agreeable for everyone.