Lars Schneider <[email protected]> writes:
>> The filter itself would need to be aware of parallelism
>> if it lives for multiple objects, right?
>
> Correct. This way Git doesn't need to deal with threading...
I think you need to be careful about three things (at least; there
may be more):
* Codepaths that check out multiple cache entries do rely on the
order of checkout. We checkout removals first to make room so
that creation of a path X can succeed if an existing path X/Y
that used to want to see X as a directory can succeed (see the
use of checkout_entry() by "git checkout", which does have two
separate loops to explicitly guarantee this), for example. I
think "remove all and then create" you do not specifically have
to worry about with the proposed change, but you may need to
inspect and verify there aren't other kind of order dependency.
* Done naively, it will lead to unmaintainable code, like this:
+ struct list_of_cache_entries *list = ...;
for (i = 0; i < active_nr; i++)
- checkout_entry(active_cache[i], state, NULL);
+ if (checkout_entry(active_cache[i], state, NULL) == DELAYED)
+ add_cache_to_queue(&list, active_cache[i]);
+ while (list) {
+ wait_for_checkout_to_finish(*list);
+ list = list->next;
+ }
I do not think we want to see such a rewrite all over the
codepaths. It might be OK to add such a "these entries are known
to be delayed" list in struct checkout so that the above becomes
more like this:
for (i = 0; i < active_nr; i++)
checkout_entry(active_cache[i], state, NULL);
+ checkout_entry_finish(state);
That is, addition of a single "some of the checkout_entry() calls
done so far might have been lazy, and I'll give them a chance to
clean up" might be palatable. Anything more than that on the
caller side is not.
* You'd need to rein in the maximum parallelism somehow, as you do
not want to see hundreds of competing filter processes starting
only to tell the main loop over an index with hundreds of entries
that they are delayed checkouts.