[Mailman-Developers] Re: Code formatting tools for Mailman project

2022-10-31 Thread Stephen J. Turnbull
Abhilash Raj writes:
 > I don't see the tools actually supporting formatting only-diffs.

I don't know if git diff supports everything GNU diff does but
--ignore-all-space would likely help quite a bit.

It shouldn't be that hard to write a blue-diff command for git:

1.  create a temporary branches at the relevant commits
2.  for each temporary branch
2.1 checkout the branch
2.2 run blue on the relevant .py files and commit
3.  diff the temparary branches and save the output
4.  delete the temporary branches

I don't know if this would help in practice, but the theory seems
good.

Scripting that should be easy, and I can test it on the Postorius
code.  I'm not going to get to it before December, though, so if you
want to do it first :-D

 > It _can_ be done by piping all the updated files to `blue` command,
 > but then reverting style changes in parts of file where there
 > wasn't any real change made would be more work than just fixing the
 > flake8 errors manually.

Is that necessary?  Since the desired result is reformatted files
without impairing the ability to diff, why would we revert the style
changes?

 > I am trying to think if we can make that process better, after making a 
 > one-time code formatting change, through tooling or alternative commands 
 > to grok the diffs.

Additional tools to parse the diffs might be useful.  Another
possibility would be to diff the AST. :-O

 > Are you usually looking for just anything that looks odd or typically 
 > any kind of difference that looks suspicious or just trying to learn the 
 > differences?

Both of those are common.  Typically it's something where I can't come
up with a bisectable test, eg, a GUI regression.  Then it's wait for
rebuild, fire up the app, mess with the GUI, repeat ... so I don't
actually bisect, once I have a bracket I just take that diff.

 > Can just comparing the commits/commit messages?

git diff --stat often helps isolate to a file.

 > or Merge Requests merged since the feature branch would help since
 > we mandate all changes go through MR workflow?

I don't see how that helps if you're looking at a dozen commits
touching that file on main.

 > I don't know if I can speak much about comparing raw diffs, because I 
 > very rarely end up doing that.

I try to avoid it, but about even in the large code bases I know best
(XEmacs, Lisp and C) half the time I end up looking at diffs.

 > I am usually looking at commit level data and looking for MRs that
 > last changed the point I am interested in with `git-blame` for some
 > contextual data around the MR. Merge commits have the MR no to
 > easily track where the change came from.

I rarely can pinpoint the relevant code without a diff, though.  Most
of the time the bad data is several frames up the stack, and was
generated far from the visible error.

Also, you and I are likely to be working on code we just wrote or
reviewed.  We need to consider the drive-by contributor who is not
going to know our code well.

Steve
___
Mailman-Developers mailing list -- mailman-developers@python.org
To unsubscribe send an email to mailman-developers-le...@python.org
https://mail.python.org/mailman3/lists/mailman-developers.python.org/
Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9


[Mailman-Developers] Re: Code formatting tools for Mailman project

2022-10-31 Thread Abhilash Raj

On 10/28/22 00:04, Stephen J. Turnbull wrote:

Abhilash Raj writes:
  > I am not sure I understand this part. Do you mean to say we'd be getting
  > too may PRs with "formatting-only" changes?

No, I mean the diffs we do to localize errors.  The day after you
merge the reformatting PR, anybody who wants to identify a regression
since Postorius 1.3.7 is going to get as many lines of " -> ' as they
do of code changes.  Half the time what appears to be a regression
from future 1.3.8 to HEAD is going to turn out to be older than the
reformat merge, same problem just less frequent.  It gets less
frequent over time, but it will take quite a while before the reformat
merge consistently contributes less than 10% of changes, I expect.


Ah, I get your point now. And, I agree with you that it would make it 
difficult to compare raw diffs.



  > Currently, my thought is to do a one-time format for the entire codebase
  > and then add "blue --check" as a part of the "qa" CI gate. Then, for
  > each subsequent MR that follows, people would need to run "tox -e
  > format" before they commit and create MR. The hope there is also

Yes, this is a great idea for new files, or files you're refactoring
to death anyway.  But it has a potentially large downside of turning
diffs that should be one-line changes into rafts of fixquote changes.


I don't see the tools actually supporting formatting only-diffs. It 
_can_ be done by piping all the updated files to `blue` command, but 
then reverting style changes in parts of file where there wasn't any 
real change made would be more work than just fixing the flake8 errors 
manually.


It sounds all-or-nothing kind of scenario when it comes to formatting 
tools, blue/black at least from what I can see.



  > My motivation here is to not have to push "oops, make flake8 gods happy"
  > commits on top of my PRs and not have to worry about it when writing code.
  >
  > While it isn't too huge of an issue, it is still manual effort that I
  > feel can be automated to reduce some work and save time.

Sure, but doing it globally also makes some work.  The question is
does it make more work (and annoyance) for everybody who does diffs
than is experienced by the developer who needs to make a flake8
commit?  I do a lot more diffs on Mailman than I do commits.


I am trying to think if we can make that process better, after making a 
one-time code formatting change, through tooling or alternative commands 
to grok the diffs.


Are you usually looking for just anything that looks odd or typically 
any kind of difference that looks suspicious or just trying to learn the 
differences? Can just comparing the commits/commit messages? or Merge 
Requests merged since the feature branch would help since we mandate all 
changes go through MR workflow?


I don't know if I can speak much about comparing raw diffs, because I 
very rarely end up doing that. I am usually looking at commit level data 
and looking for MRs that last changed the point I am interested in with 
`git-blame` for some contextual data around the MR. Merge commits have 
the MR no to easily track where the change came from.


  > > What is the purpose of this?  Anybody who runs it on a code base
  > > before the blue'd code gets merged is going to generate hard to read
  > > crufty PRs, no?
  >
  > How so? If the existing code base is already auto-formatted, then the
  > PRs would be just regular diffs, formatted with the same tool.

Maybe tox -e format isn't a problem because you would add that tox
target in the same commit (or later) you do the reformat, but if this
is going to be policy, some people will aggressively reformat their
code to that standard before the package is reformatted.  Every once
in a while you still see MRs in Python where somebody PEP8s a whole
stdlib module where the contribution is a 2-line doc improvement.
Those get rejected of course.  We should do that too.


My intent with `tox -e format` was to aid with the formatting of the 
changes folks would contribute, with the assumption that the entire 
codebase is already formatted.


But yeah, we shouldn't really entertain format-only changes.


  > I've tried to solve at least the "git-blame" issue by adding the rev of
  > the commit with the "mass-refactor" into a file in the main repo that
  > you can feed to "git blame --ignore-rev-file .git-blame-ignore-revs".
  >
  > For editors, there is a way to set git config for the blame-ignore-file
  > so it works across everything that uses git.

That's very helpful for some operations.  How to make it discoverable
is something of a headache though -- I didn't know there was a
blame-ignore-file!


Yeah, I agree. I just added to Postorius with the formatting changes, 
currently it doesn't exist for other repos since we haven't really done 
any of those formatting changes.



--
thanks,
Abhilash Raj (maxking)



OpenPGP_signature
Description: OpenPGP digital signature
___

[Mailman-Developers] Re: Code formatting tools for Mailman project

2022-10-27 Thread Stephen J. Turnbull
Abhilash Raj writes:

 > > I (speaking only for myself) have no objection going forward for new
 > > files and major refactorings.  But if you do this all at once, I fear
 > > that most interesting diffs for the next year or so will be full of
 > > reformatting cruft, 
 > 
 > I am not sure I understand this part. Do you mean to say we'd be getting 
 > too may PRs with "formatting-only" changes?

No, I mean the diffs we do to localize errors.  The day after you
merge the reformatting PR, anybody who wants to identify a regression
since Postorius 1.3.7 is going to get as many lines of " -> ' as they
do of code changes.  Half the time what appears to be a regression
from future 1.3.8 to HEAD is going to turn out to be older than the
reformat merge, same problem just less frequent.  It gets less
frequent over time, but it will take quite a while before the reformat
merge consistently contributes less than 10% of changes, I expect.

 > Currently, my thought is to do a one-time format for the entire codebase 
 > and then add "blue --check" as a part of the "qa" CI gate. Then, for 
 > each subsequent MR that follows, people would need to run "tox -e 
 > format" before they commit and create MR. The hope there is also

Yes, this is a great idea for new files, or files you're refactoring
to death anyway.  But it has a potentially large downside of turning
diffs that should be one-line changes into rafts of fixquote changes.

 > My motivation here is to not have to push "oops, make flake8 gods happy" 
 > commits on top of my PRs and not have to worry about it when writing code.
 > 
 > While it isn't too huge of an issue, it is still manual effort that I 
 > feel can be automated to reduce some work and save time.

Sure, but doing it globally also makes some work.  The question is
does it make more work (and annoyance) for everybody who does diffs
than is experienced by the developer who needs to make a flake8
commit?  I do a lot more diffs on Mailman than I do commits.

 > > What is the purpose of this?  Anybody who runs it on a code base
 > > before the blue'd code gets merged is going to generate hard to read
 > > crufty PRs, no?  
 > 
 > How so? If the existing code base is already auto-formatted, then the 
 > PRs would be just regular diffs, formatted with the same tool.

Maybe tox -e format isn't a problem because you would add that tox
target in the same commit (or later) you do the reformat, but if this
is going to be policy, some people will aggressively reformat their
code to that standard before the package is reformatted.  Every once
in a while you still see MRs in Python where somebody PEP8s a whole
stdlib module where the contribution is a 2-line doc improvement.
Those get rejected of course.  We should do that too.

 > I've tried to solve at least the "git-blame" issue by adding the rev of 
 > the commit with the "mass-refactor" into a file in the main repo that 
 > you can feed to "git blame --ignore-rev-file .git-blame-ignore-revs".
 > 
 > For editors, there is a way to set git config for the blame-ignore-file 
 > so it works across everything that uses git.

That's very helpful for some operations.  How to make it discoverable
is something of a headache though -- I didn't know there was a
blame-ignore-file!

Steve
___
Mailman-Developers mailing list -- mailman-developers@python.org
To unsubscribe send an email to mailman-developers-le...@python.org
https://mail.python.org/mailman3/lists/mailman-developers.python.org/
Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9