On Mon, Nov 12, 2018 at 10:08:10AM -0800, Elijah Newren wrote:
> > I would do:
> >
> > git log --raw $(
> > git cat-file --batch-check='%(objectsize:disk) %(objectname)'
> > --batch-all-objects |
> > sort -rn | head -3 |
> > awk '{print "--find-object=" $2 }'
> > )
> >
> > I'm not sure how renames enter into it at all.
>
> How did I miss objectsize:disk?? Especially since it is right next to
> objectsize in the manpage to boot? That's awesome, thanks for that
> pointer.
>
> I do have a separate cat-file --batch-check --batch-all-objects
> process already, since I can't get sizes out of either log or
> fast-export. However, I wouldn't use your 'head -3' since I'm not
> looking for the N biggest, but reporting on _all_ objects (in reverse
> size order) and letting the user look over the report and deciding
> where to stop reading. So, this is a big and expensive log command.
> Granted, we will need a big and expensive log command, but let's keep
> in mind that we have this one.
It is an expensive log command, but it's the same expense as running
fast-export, no? And I think maybe that is the disconnect.
I am looking at this problem as "how do you answer question X in a
repository". And I think you are looking at as "I am receiving a
fast-export stream, and I need to answer question X on the fly".
And that would explain why you want to get extra annotations into the
fast-export stream. Is that right?
> > There I think you'd want to assemble the list with something like "git
> > log --follow --name-only paths-of-interest" except that --follow sucks
> > too much to handle more than one path at a time.
> >
> > But if you wanted to do it manually, then:
> >
> > git log --diff-filter=R --name-only
> >
> > would be enough to let you track it down, wouldn't it?
>
> Without a -M you'd only catch 100% renames, right? Those aren't the
> only ones I'd want to catch, so I'd need to add -M. You are right
> that we could get basic renames this way, but it doesn't cover
> everything I need. Let's use this as a starting point, though, and
> build up to what I need...
No, renames are on by default these days, and that includes inexact
renames. That said, if you're scripting you probably ought to be doing:
git rev-list HEAD | git diff-tree --stdin
and there yes, you'd have to enable "-M" yourself (you touched on
scripting and formatting below; diff-tree can accept the format options
you'd want).
> I also want to know when files were deleted. I've generally found
> that people are more okay with purging parts of history [corresponding
> to large ojbects] that were deleted longer ago than more recent stuff,
> for a variety of reasons. So we could either run yet another log, or
> modify the command to:
>
> git log -M --diff-filter=RD --name-status
>
> However, I don't just want to know when files were deleted, I'd like
> to know when directories are deleted. I only knew how to derive that
> from knowing what files existed within those directories, so that
> would take me to:
>
> git log -M --diff-filter=RAD --name-status
>
> [Edit: I just saw your other email and for the first time learned
> about the -t rev-list option which might simplify this a little,
> although "need to worry about deleted files being reinstated" below
> might require the 'A' anyway.]
Yeah, I think "-t" would help your tree deletion problem.
> At this point, let's remember that we had another full git-log
> invocation for mapping object sizes to filenames. We might as well
> coalesce the two log commands into one, by extending this latest one
> to:
>
> git log -M --diff-filter=RAMD --no-abbrev --raw
What is there besides RAMD? :)
> I could potentially switch to using this and drop patch 10/10.
So I'm still not _entirely_ clear on what you're trying to do with
10/10. I think maybe the "disconnect" part I wrote above explains it. If
that's correct, then I think framing it in terms of the operations that
you'd be able to perform _without running a separate traverse_ would
make it more obvious.
> Anyway, I hope it makes a little more sense why I created this patch.
> Does it, or have I just made things even more confusing?
Some of both, I think.
> ...and if you've read this far, I'm impressed. Thanks for reading.
I'll admit I skimmed near the end. ;)
-Peff