On Fri, Jun 19, 2015 at 10:10:59AM +0100, Charles Bailey wrote:
> filter-objects is a command to scan all objects in the object database
> for the repository and print the ids of those which match the given
> criteria.
>
> The current supported criteria are object type and the minimum size of
> the object.
>
> The guiding use case is to scan repositories quickly for large objects
> which may cause performance issues for users. The list of objects can
> then be used to guide some future remediating action.
I've had to perform this exact same task. You can already do the
"filtering" part pretty easily and efficiently with cat-file and a perl
script, like:
magically_generate_all_objects |
git cat-file --batch-check='%(objectsize) %(objectname)' |
perl -alne 'print $F[1] if $F[0] > 1234'
That's not as friendly as your filter-objects, but it's a lot more
flexible (since you can ask cat-file for all sorts of information).
Obviously I've glossed over the "how to get a list of objects" part.
If you truly want all objects (not just reachable ones), or if "rev-list
--objects" is too slow, the best way is:
objects() {
# loose objects
for i in objects/??/*; do
echo $i
done |
sed 's,objects/\(..\)/,\1,'
# packed objects
for i in objects/pack/*.idx; do
git show-index <$i
done |
cut -d' ' -f2
}
Certainly I'm not opposed to doing something less horrible there (and I
am happy to see my for_each_*_object interface getting more callers!).
I kind of wonder if we should make "all objects, reachable or not" an
option for rev-list. I'm not sure if it would choke on adding them all
to the "pending" list, though; it's not really made for that. But it
would enable neat things like:
git rev-list --all-the-objects --not --all
to show you what's unreachable.
-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html