Re: history at diff blocks level

Denis Laxalde Tue, 18 Oct 2016 00:00:38 -0700

Jun Wu a écrit :

Excerpts from Denis Laxalde's message of 2016-10-03 16:38:17 +0200:

Hi all,


I've been recently thinking about adding some support in Mercurial to
query repository history based on changes within a given line range in a
file. I think that would be useful in at least two commands:

* log, to restrict history to revisions that modify specified part(s) of
file(s) and only display the diff around specified line range and,

* annotate, to window the annotate result and maybe consider walking
file revision from tip to base to reduce computation costs.

(The "log" part is more interesting, I think.)


I've been thinking about this as well for "fastannotate --deleted". Although
"annotate" is generally easier than "log" in this case: slicing the
annotated lines seems to be enough.


Yes, slicing is trivial and probably acceptable for fastannotate. But
for the "slow" annotate, I was more thinking about walking file
revisions from tip to base in case of a line range query because I tend
to believe this would be more efficient in this case since there's no
reason all revisions of the file to involved in a restricted line range.
The algorithm would just stop when all lines in the range are annotated.
But I think this requires a more significant refactoring (being able to
either walk revision top-down as opposed to down-up currently).

 From UI point of view, the basic idea is to specify a (file name, line
range) pair and the simplest solution I could find is something like:

   hg log/annotate --line-range fromline,toline FILE

but this does not work well with several files. (Perhaps something like
hg log FILE:fromline:toline would be better.) I also thought about a


+1 for "FILE:fromline:toline". It is intuitive and makes sense. A new
boolean flag (like "--line-ranges") that enables the syntax explicitly
may be necessary. The flag can avoid conflicts with existing matcher syntax,
and make it clear that some commands like "add" do not support line ranges.


"FILE:fromline:toline" is also my favorite option. But I'm not sure I'd
like to be forced to specify an extra option to be able to use this
syntax. I'd much prefer if this could be avoided, though we'll indeed
have to handle conflicts with existing matcher syntax. Or use another
separator? Any other idea welcome!

"changes(filename, fromline, toline)" revset (or an extension of the
existing "modifies()" revset), but it seems to me that this would not
fit well for both log and annotate. Suggestions welcome.


I've implemented this part. If you want to give it a try:

  hg pull https://hg.logilab.org/users/dlaxalde/hg -r f19e3327c438

There's only the revset part so far, the diff output is not filtered.


 From the technical point of view, my idea is to rely on
mdiff.allblocks(<file content at rev1>, <file content at rev 2>) (used
in both annotate and log, when --patch option is given) to:

1. filter blocks depending on whether they are relevant to specified
line range (e.g., for the log command there's some "!" block), and,

2. track the evolution of the line range across revisions (namely, given
the line range at rev2, find the line range at rev1 in the example above).

I have something working concerning this "low level" aspect, but I'm
somehow getting stuck when it comes to plug things into the log command
call. Namely, I need to pass the "line range" information from one
revision to another during iterations of the loop on revisions in
commands.log() [1] and pass this information down to the mdiff.unidiff()
call [2] which would then give me back another line range to push up to
the outer loop on revisions. Given the complexity of the call chain, I
actually think this is not a very good idea... So the best idea I could
come up with is to filter revisions beforehand (as would a revset do)
but this would imply keeping track of files' line ranges per revision
(to avoid processing diff blocks twice when --patch option is specified
in particular). All in all, it's not clear to me which "tool" I may use
to achieve this (I thought about using the "filematcher" built along
with "revs" in commands.log(), but not really sure it's a good idea).
Maybe I just need a data structure that does not exist yet?
I'd appreciate any pointer to move forward.


I think "changeset_printer" and "diffordiffstat" are worth considering.
"diffordiffstat" is currently stateless. A possible direction is to add a
new stateful "diffordiffstat" that tracks the line ranges.

If revisions are filtered before-hand, the state could be passed to the new
"diffordiffstat" function to avoid unnecessary calculation.

It seems to me that high level diff function like "mdiff.unidiff" could take
an extra parameter "difffunc" which defaults to "allblocks". Then we can
have a "filteredallblocks" passed to "unidiff".


Thanks for these hints, I'll try to dig this way as soon as I get back
to this topic.

--
Denis Laxalde
Logilab         http://www.logilab.fr
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Re: history at diff blocks level

Reply via email to