Re: [DISCUSS] Autoformat python code with Black

Robert Bradshaw Tue, 21 Jan 2020 14:07:25 -0800

Huge +1 to autoformatters. Yapf solves my biggest complaint about
black, that it can't be customized (and I didn't want to take on
maintaining our own fork of black).


It'd be good if there was a way to only apply to violating (or at
least changed) lines.

On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <[email protected]> wrote:
>
> +1 to autoformatting
>
> Let me add some nuance to that.
>
> The way I see it there are 2 varieties of formatters:  those which take the 
> original formatting into consideration (autopep8) and those which disregard 
> it (yapf, black).
>
> I much prefer yapf to black, because you have plenty of options to tweak with 
> yapf (enough to make the output a pretty close match to the current Beam 
> style), and you can mark areas to preserve the original formatting, which 
> could be very useful with Pipeline building with pipe operators.  Please 
> don't pick black.
>
> autopep8 is more along the lines of spotless in Java -- it only corrects code 
> that breaks the project's style rules.  The big problem with Beam's current 
> style is that it is so esoteric that autopep8 can't enforce it -- and I'm not 
> just talking about 2-spaces, which I don't really have a problem with -- the 
> problem is the use of either 2 or 4 spaces depending on context (expression 
> start vs hanging indent, etc).  This is my *biggest* gripe about the current 
> style.  PyCharm doesn't have enough control either.  So, if we can choose a 
> style that can be expressed by flake8 or pycodestyle then we can use autopep8 
> to enforce it.
>
> I'd prefer autopep8 to yapf because I like having a little wiggle room to 
> influence the style, but on a big project like Beam all that wiggle room ends 
> up to minor but noticeable inconsistencies in style throughout the project.  
> yapf ensures completely consistent style, but the tradeoff is that it's 
> sometimes ugly, especially in scenarios with similar repeated entries like 
> argparse, where yapf might insert line breaks in visually inconsistent and 
> unappealing ways depending on the lengths of the keywords and expressions 
> involved.
>
> Either way (but especially if we choose yapf) I think it'd be a nice addition 
> to setup a pre-commit [1] config so that people can opt in to running 
> *lightweight* autofixers prior to commit.  This will not only reduce dev 
> frustration but will also reduce the amount of cpu cycles that Jenkins spends 
> pointing out lint errors.
>
> [1] https://pre-commit.com/
>
> -chad
>
>
>
>
> On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <[email protected]> wrote:
>>
>> Last time we discussed this there seems not to be much progress into 
>> autoformatting.
>> This tool looks more tweakable, so maybe it could be more appropriate for 
>> Beam's use case.
>> https://github.com/google/yapf/
>> WDYT?
>>
>>
>> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <[email protected]> wrote:
>>>
>>> +1 for any autoformatter for Python SDK that does the job. My experience is 
>>> that since spotless in Java SDK I would never start a new Java project 
>>> without it. So many great benefits not only for one person coding but for 
>>> all community.
>>>
>>> It is a GitHub UI issue that you cannot easily browse past the reformat. It 
>>> is not actually that hard, but does take a couple extra clicks to get 
>>> GitHub to display blame before a reformat. It is easier with the command 
>>> line. I do a lot of code history digging and the global Java reformat is 
>>> not really a problem.
>>>
>>> It's actually one more click on Github but I agree it's not the best way to 
>>> search the history. The most convenient and clear one I've found so far is 
>>> in Jetbrains IDEs (Intelij) where you can:
>>>
>>> right click on line number -> "annotate" -> click again -> "annotate 
>>> previous revision" -> ...
>>>
>>> You can also use "compare with" to see the diff between two revisions.
>>>
>>> Łukasz
>>>
>>>
>>>
>>>
>>>
>>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <[email protected]> napisał(a):
>>>>
>>>> +1 pending good enough tooling (I can't quite tell - seems there are some 
>>>> issues?)
>>>>
>>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk 
>>>> <[email protected]> wrote:
>>>>>
>>>>> What else actually we gain? My guess is faster PR review iteration. We 
>>>>> will skip some of conversations about code style.
>>>>
>>>> ...
>>>>>
>>>>> Last but not least, new contributor may be less discouraged. When I 
>>>>> started contribute I didn’t know how to format my code and I lost a lot 
>>>>> of time to add pylint and adjust IntelliJ. I eventually failed. Currently 
>>>>> I write code intuitively and when I don’t forget I rerun tox.
>>>>
>>>>
>>>> This is a huge benefit. This is why I supported it so much for Java. It is 
>>>> a community benefit. You do not have to be a contributor to the Python SDK 
>>>> to support this. That is why I am writing here. Just eliminate all 
>>>> discussion of formatting. It doesn't really matter what the resulting 
>>>> format is, if it is not crazy to read. I strongly oppose maintaining a 
>>>> non-default format.
>>>>
>>>> Reformating 20k lines or 200k is not hard. The Java global reformat 
>>>> touched 50k lines. It does not really matter how big it is. Definitely do 
>>>> it all at once if you think the tool is good enough. And you should pin a 
>>>> version, so churn is not a problem. You can upgrade the version and 
>>>> reformat in a PR later and that is also easy.
>>>>
>>>> It is a GitHub UI issue that you cannot easily browse past the reformat. 
>>>> It is not actually that hard, but does take a couple extra clicks to get 
>>>> GitHub to display blame before a reformat. It is easier with the command 
>>>> line. I do a lot of code history digging and the global Java reformat is 
>>>> not really a problem.
>>>>
>>>> Kenn
>>>>
>>>>
>>>>>
>>>>> Also everything will be formatted in a same way, so eventually it would 
>>>>> be easier to read.
>>>>>
>>>>> Moreover, as it was mentioned in previous emails - a lot of Jenkins 
>>>>> failures won’t take place, so we save time and resources.
>>>>>
>>>>>
>>>>> One of disadvantages is that our pipelines has custom syntax and after 
>>>>> formatting they looks a little bit weird, but maybe extending the only 
>>>>> configurable option in Black - lines, from 88 to 110 would be solution.
>>>>>
>>>>> Second one is that Black requires Python 3 to be run. I don’t know how 
>>>>> big obstacle it would be.
>>>>>
>>>>>
>>>>> I believe there are two options how it would be possible to introduce 
>>>>> Black. First: just do it, it will hurt but then it would be ok (same as a 
>>>>> dentist appointment). Of course it may require some work to adjust 
>>>>> linters. On the other hand we can do it gradually and start including sdk 
>>>>> parts one by one - maybe it will be less painful?
>>>>>
>>>>>
>>>>> As an example I can share one of projects [2] I know that uses Black 
>>>>> (they use also other cool checkers and pre-commit [3]). This is how looks 
>>>>> their build with all checks [4].
>>>>>
>>>>>
>>>>> To sum up I believe that if we want improve our coding experience, we 
>>>>> should improve our toolset. Black seems be recent and quite popular tool 
>>>>> what makes think they won’t stop developing it.
>>>>>
>>>>>
>>>>> [1] 
>>>>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>>>>
>>>>>
>>>>> [2]  https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>>>>
>>>>>
>>>>> [3] https://pre-commit.com
>>>>>
>>>>>
>>>>> [4] 
>>>>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>>>>
>>>>>
>>>>>
>>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <[email protected]> 
>>>>> wrote:
>>>>>>
>>>>>> Reformatting to 4 spaces seems a non-starter to me, as it would change 
>>>>>> nearly every single line in the codebase (and the loss of all context as 
>>>>>> well as that particular line).
>>>>>>
>>>>>> This is probably why the 2-space fork exists. However, we don't conform 
>>>>>> to that either--we use 2 spaces for indentation, but 4 for continuation 
>>>>>> indentation. (As for the history of this, this goes back to Google's 
>>>>>> internal style guide, probably motivated by consistency with C++, Java, 
>>>>>> ... and the fact that with an indent level of 4 one ends up wrapping 
>>>>>> lines quite frequently (it's telling that black's default line length is 
>>>>>> 88)). This turns out to be an easy change to the codebase.
>>>>>>
>>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I found that this tool 
>>>>>> introduces a huge amount of vertical whitespace (e.g. closing 
>>>>>> parentheses on their own line), e.g.
>>>>>>
>>>>>> def foo(
>>>>>>     args
>>>>>> ):
>>>>>>   if (
>>>>>>       long expression)
>>>>>>   ):
>>>>>>     func(
>>>>>>         args
>>>>>>     )
>>>>>>
>>>>>> I wrote a simple post-processor to put closing parentheses on the same 
>>>>>> lines, as well as omit the newline after "if (", and disabling 
>>>>>> formatting of strings, which reduce the churn in our codebase to 15k 
>>>>>> lines (adding about 4k) out of 200k total.
>>>>>>
>>>>>> https://github.com/apache/beam/pull/8712/files
>>>>>>
>>>>>> It's still very opinionated, often in different ways then me, and 
>>>>>> doesn't understand the semantics of the code, but possibly something we 
>>>>>> could live with given the huge advantages of an autoformatter.
>>>>>>
>>>>>> An intermediate point would be to allow, but not require, autoformatting 
>>>>>> of changed lines.
>>>>>>
>>>>>> As for being beta quality, it looks like it's got a decent number of 
>>>>>> contributors and in my book being in the python github project is a 
>>>>>> strong positive signal. But, due to the above issues, I think we'd have 
>>>>>> to maintain a fork. (The code is pretty lightweight, the 2 vs. 4 space 
>>>>>> issue is a 2-line change, and the rest implemented as a post-processing 
>>>>>> step (for now, incomplete), so it'd be easy to stay in sync with 
>>>>>> upstream.)
>>>>>>
>>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <[email protected]> wrote:
>>>>>> >
>>>>>> > > I think the question is if it can be configured in a way to fit our
>>>>>> > > current linter's style. I don't think it is feasible to reformat the
>>>>>> > > entire Python SDK.
>>>>>> >
>>>>>> > It cannot be configured to do what we actually do because Black is
>>>>>> > configurable only to support the standard python codestyle guidelines
>>>>>> > (PEP-8) which recommends 4 spaces and is what most projects in the
>>>>>> > python world use.
>>>>>> >
>>>>>> > > Reformatted lines don't allow quick access to the Git history. This
>>>>>> > > effect is still visible in the Java SDK. However, I have the feeling
>>>>>> > > that this might be less of a problem with Python because the linter 
>>>>>> > > has
>>>>>> > > more rules than Checkstyle had.
>>>>>> >
>>>>>> > Yes that’s the bad side effect but there are always tradeoffs we have
>>>>>> > to deal with.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <[email protected]> 
>>>>>> > wrote:
>>>>>> > >
>>>>>> > > I think the question is if it can be configured in a way to fit our
>>>>>> > > current linter's style. I don't think it is feasible to reformat the
>>>>>> > > entire Python SDK.
>>>>>> > >
>>>>>> > > Reformatted lines don't allow quick access to the Git history. This
>>>>>> > > effect is still visible in the Java SDK. However, I have the feeling
>>>>>> > > that this might be less of a problem with Python because the linter 
>>>>>> > > has
>>>>>> > > more rules than Checkstyle had.
>>>>>> > >
>>>>>> > > -Max
>>>>>> > >
>>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>>>>>> > > >> My concerns are:
>>>>>> > > >> - The product is clearly marked as beta with a big warning.
>>>>>> > > >> - It looks like mostly a single person project. For the same 
>>>>>> > > >> reason I also strongly prefer not using a fork for a specific 
>>>>>> > > >> setting. Fork will only have less people looking at it.
>>>>>> > > >
>>>>>> > > > I suppose the project is marked as beta because it is recent, it 
>>>>>> > > > was
>>>>>> > > > presented in 2018’s pycon, and because some things can change since
>>>>>> > > > auto-formatters are pretty tricky beasts, I think beta in that 
>>>>>> > > > case is
>>>>>> > > > like our own ‘@Experimental’. If you look at the contribution page 
>>>>>> > > > [1]
>>>>>> > > > you can notice that it is less and less a single person project, 
>>>>>> > > > there
>>>>>> > > > have been 93 independent contributions since the project became
>>>>>> > > > public, and the fact that it is hosted in the python organization
>>>>>> > > > github [2] gives some confidence on the project continuity.
>>>>>> > > >
>>>>>> > > > You are right however about the fact that the main author seems to 
>>>>>> > > > be
>>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces issue he can seem
>>>>>> > > > arbitrary, but he is just following pep8 style guide 
>>>>>> > > > recommendations
>>>>>> > > > [3]. I am curious of why we (Beam) do not follow the 4 spaces
>>>>>> > > > recommendation of PEP-8 or even Google's own Python style guide 
>>>>>> > > > [4],
>>>>>> > > > So, probably it should be to us to reconsider the current policy to
>>>>>> > > > adapt to the standards (and the tool).
>>>>>> > > >
>>>>>> > > > I did a quick run of black with python 2.7 compatibility on
>>>>>> > > > sdks/python and got only 4 parsing errors which is positive given 
>>>>>> > > > the
>>>>>> > > > size of our code base.
>>>>>> > > >
>>>>>> > > > 415 files reformatted, 45 files left unchanged, 4 files failed to 
>>>>>> > > > reformat.
>>>>>> > > >
>>>>>> > > > error: cannot format
>>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>>>>>> > > > Cannot parse: 47:22:   _display_progress = print
>>>>>> > > > error: cannot format
>>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>>>>>> > > > Cannot parse: 151:18:               file=sys.stderr)
>>>>>> > > > error: cannot format
>>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>>>>>> > > > Cannot parse: 160:34:       print(traceback_string, 
>>>>>> > > > file=sys.stderr)
>>>>>> > > > error: cannot format
>>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>>>>>> > > > Cannot parse: 335:51:       print('-->' if pc == last_pc else '    
>>>>>> > > > ',
>>>>>> > > > end=' ')
>>>>>> > > >
>>>>>> > > > I still think this can be positive for the project but well I am
>>>>>> > > > barely a contributor to the python code base so I let you the 
>>>>>> > > > python
>>>>>> > > > maintainers to reconsider this, in any case it seems like a good
>>>>>> > > > improvement for the project.
>>>>>> > > >
>>>>>> > > > [1] https://github.com/python/black/graphs/contributors
>>>>>> > > > [2] https://github.com/python
>>>>>> > > > [3] https://www.python.org/dev/peps/pep-0008/#indentation
>>>>>> > > > [4] 
>>>>>> > > > https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>>>>>> > > >
>>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <[email protected]> 
>>>>>> > > > wrote:
>>>>>> > > >>
>>>>>> > > >> I am in the same boat with Robert, I am in favor of 
>>>>>> > > >> autoformatters but I am not familiar with this one. My concerns 
>>>>>> > > >> are:
>>>>>> > > >> - The product is clearly marked as beta with a big warning.
>>>>>> > > >> - It looks like mostly a single person project. For the same 
>>>>>> > > >> reason I also strongly prefer not using a fork for a specific 
>>>>>> > > >> setting. Fork will only have less people looking at it.
>>>>>> > > >>
>>>>>> > > >> IMO, this is in an early stage for us. That said lint issues are 
>>>>>> > > >> real as pointed in the thread. If someone would like to give it a 
>>>>>> > > >> try and see how it would look like for us that would be 
>>>>>> > > >> interesting.
>>>>>> > > >>
>>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna Kucharczyk 
>>>>>> > > >> <[email protected]> wrote:
>>>>>> > > >>>
>>>>>> > > >>> This sounds really good. A lot of Jenkins jobs failures are 
>>>>>> > > >>> caused by lint problems.
>>>>>> > > >>> I think it would be great to have something similar to Spotless 
>>>>>> > > >>> in Java SDK (I heard there is problem with configuring Black 
>>>>>> > > >>> with IntelliJ).
>>>>>> > > >>>
>>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw 
>>>>>> > > >>> <[email protected]> wrote:
>>>>>> > > >>>>
>>>>>> > > >>>> I'm generally in favor of autoformatters, though I haven't 
>>>>>> > > >>>> looked at
>>>>>> > > >>>> how well this particular one works. We might have to go with
>>>>>> > > >>>> https://github.com/desbma/black-2spaces given
>>>>>> > > >>>> https://github.com/python/black/issues/378 .
>>>>>> > > >>>>
>>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada 
>>>>>> > > >>>> <[email protected]> wrote:
>>>>>> > > >>>>>
>>>>>> > > >>>>> This looks pretty good:) I know at least a couple people 
>>>>>> > > >>>>> (myself included) who've been annoyed by having to take care 
>>>>>> > > >>>>> of lint issues that maybe a code formatter could save us.
>>>>>> > > >>>>> Thanks for sharing Ismael.
>>>>>> > > >>>>> -P.
>>>>>> > > >>>>>
>>>>>> > > >>>>>
>>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía 
>>>>>> > > >>>>> <[email protected]> wrote:
>>>>>> > > >>>>>>
>>>>>> > > >>>>>> I stumbled by chance into Black [1] a python code auto 
>>>>>> > > >>>>>> formatter that
>>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for python, and 
>>>>>> > > >>>>>> wanted to
>>>>>> > > >>>>>> bring to the ML Is there interest from the python people to 
>>>>>> > > >>>>>> get this
>>>>>> > > >>>>>> into the build?
>>>>>> > > >>>>>>
>>>>>> > > >>>>>> The introduction of spotless for Java has been a good 
>>>>>> > > >>>>>> improvement and
>>>>>> > > >>>>>> maybe the python code base may benefit of this too.
>>>>>> > > >>>>>>
>>>>>> > > >>>>>> WDYT?
>>>>>> > > >>>>>>
>>>>>> > > >>>>>> [1] https://github.com/python/black

Re: [DISCUSS] Autoformat python code with Black

Reply via email to