Re: [DISCUSS] Autoformat python code with Black

Udi Meiri Wed, 22 Jan 2020 15:40:01 -0800

It sounds like there's a consensus for yapf. I volunteer to take this on

On Wed, Jan 22, 2020, 10:31 Udi Meiri <[email protected]> wrote:


> +1 to autoformatting
>
> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <[email protected]> wrote:
>
>> +1 to autoformatters. Also the Beam Java SDK went through a one time pass
>> to apply the spotless formatting.
>>
>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <[email protected]> wrote:
>>
>>> +1 to autoformatters and yapf. It appears to be a well maintained
>>> project. I do support making a one time pass to apply formatting the whole
>>> code base.
>>>
>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <[email protected]> wrote:
>>>
>>>>
>>>>> It'd be good if there was a way to only apply to violating (or at
>>>>> least changed) lines.
>>>>
>>>>
>>>> I assumed the first thing we’d do is convert all of the code in one go,
>>>> since it’s a very safe operation. Did you have something else in mind?
>>>>
>>>> -chad
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <[email protected]>
>>>>> wrote:
>>>>> >
>>>>> > +1 to autoformatting
>>>>> >
>>>>> > Let me add some nuance to that.
>>>>> >
>>>>> > The way I see it there are 2 varieties of formatters:  those which
>>>>> take the original formatting into consideration (autopep8) and those which
>>>>> disregard it (yapf, black).
>>>>> >
>>>>> > I much prefer yapf to black, because you have plenty of options to
>>>>> tweak with yapf (enough to make the output a pretty close match to the
>>>>> current Beam style), and you can mark areas to preserve the original
>>>>> formatting, which could be very useful with Pipeline building with pipe
>>>>> operators.  Please don't pick black.
>>>>> >
>>>>> > autopep8 is more along the lines of spotless in Java -- it only
>>>>> corrects code that breaks the project's style rules.  The big problem with
>>>>> Beam's current style is that it is so esoteric that autopep8 can't enforce
>>>>> it -- and I'm not just talking about 2-spaces, which I don't really have a
>>>>> problem with -- the problem is the use of either 2 or 4 spaces depending 
>>>>> on
>>>>> context (expression start vs hanging indent, etc).  This is my *biggest*
>>>>> gripe about the current style.  PyCharm doesn't have enough control
>>>>> either.  So, if we can choose a style that can be expressed by flake8 or
>>>>> pycodestyle then we can use autopep8 to enforce it.
>>>>> >
>>>>> > I'd prefer autopep8 to yapf because I like having a little wiggle
>>>>> room to influence the style, but on a big project like Beam all that 
>>>>> wiggle
>>>>> room ends up to minor but noticeable inconsistencies in style throughout
>>>>> the project.  yapf ensures completely consistent style, but the tradeoff 
>>>>> is
>>>>> that it's sometimes ugly, especially in scenarios with similar repeated
>>>>> entries like argparse, where yapf might insert line breaks in visually
>>>>> inconsistent and unappealing ways depending on the lengths of the keywords
>>>>> and expressions involved.
>>>>> >
>>>>> > Either way (but especially if we choose yapf) I think it'd be a nice
>>>>> addition to setup a pre-commit [1] config so that people can opt in to
>>>>> running *lightweight* autofixers prior to commit.  This will not only
>>>>> reduce dev frustration but will also reduce the amount of cpu cycles that
>>>>> Jenkins spends pointing out lint errors.
>>>>> >
>>>>> > [1] https://pre-commit.com/
>>>>> >
>>>>> > -chad
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <[email protected]>
>>>>> wrote:
>>>>> >>
>>>>> >> Last time we discussed this there seems not to be much progress
>>>>> into autoformatting.
>>>>> >> This tool looks more tweakable, so maybe it could be more
>>>>> appropriate for Beam's use case.
>>>>> >> https://github.com/google/yapf/
>>>>> >> WDYT?
>>>>> >>
>>>>> >>
>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <[email protected]>
>>>>> wrote:
>>>>> >>>
>>>>> >>> +1 for any autoformatter for Python SDK that does the job. My
>>>>> experience is that since spotless in Java SDK I would never start a new
>>>>> Java project without it. So many great benefits not only for one person
>>>>> coding but for all community.
>>>>> >>>
>>>>> >>> It is a GitHub UI issue that you cannot easily browse past the
>>>>> reformat. It is not actually that hard, but does take a couple extra 
>>>>> clicks
>>>>> to get GitHub to display blame before a reformat. It is easier with the
>>>>> command line. I do a lot of code history digging and the global Java
>>>>> reformat is not really a problem.
>>>>> >>>
>>>>> >>> It's actually one more click on Github but I agree it's not the
>>>>> best way to search the history. The most convenient and clear one I've
>>>>> found so far is in Jetbrains IDEs (Intelij) where you can:
>>>>> >>>
>>>>> >>> right click on line number -> "annotate" -> click again ->
>>>>> "annotate previous revision" -> ...
>>>>> >>>
>>>>> >>> You can also use "compare with" to see the diff between two
>>>>> revisions.
>>>>> >>>
>>>>> >>> Łukasz
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <[email protected]>
>>>>> napisał(a):
>>>>> >>>>
>>>>> >>>> +1 pending good enough tooling (I can't quite tell - seems there
>>>>> are some issues?)
>>>>> >>>>
>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <
>>>>> [email protected]> wrote:
>>>>> >>>>>
>>>>> >>>>> What else actually we gain? My guess is faster PR review
>>>>> iteration. We will skip some of conversations about code style.
>>>>> >>>>
>>>>> >>>> ...
>>>>> >>>>>
>>>>> >>>>> Last but not least, new contributor may be less discouraged.
>>>>> When I started contribute I didn’t know how to format my code and I lost a
>>>>> lot of time to add pylint and adjust IntelliJ. I eventually failed.
>>>>> Currently I write code intuitively and when I don’t forget I rerun tox.
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> This is a huge benefit. This is why I supported it so much for
>>>>> Java. It is a community benefit. You do not have to be a contributor to 
>>>>> the
>>>>> Python SDK to support this. That is why I am writing here. Just eliminate
>>>>> all discussion of formatting. It doesn't really matter what the resulting
>>>>> format is, if it is not crazy to read. I strongly oppose maintaining a
>>>>> non-default format.
>>>>> >>>>
>>>>> >>>> Reformating 20k lines or 200k is not hard. The Java global
>>>>> reformat touched 50k lines. It does not really matter how big it is.
>>>>> Definitely do it all at once if you think the tool is good enough. And you
>>>>> should pin a version, so churn is not a problem. You can upgrade the
>>>>> version and reformat in a PR later and that is also easy.
>>>>> >>>>
>>>>> >>>> It is a GitHub UI issue that you cannot easily browse past the
>>>>> reformat. It is not actually that hard, but does take a couple extra 
>>>>> clicks
>>>>> to get GitHub to display blame before a reformat. It is easier with the
>>>>> command line. I do a lot of code history digging and the global Java
>>>>> reformat is not really a problem.
>>>>> >>>>
>>>>> >>>> Kenn
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>>
>>>>> >>>>> Also everything will be formatted in a same way, so eventually
>>>>> it would be easier to read.
>>>>> >>>>>
>>>>> >>>>> Moreover, as it was mentioned in previous emails - a lot of
>>>>> Jenkins failures won’t take place, so we save time and resources.
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> One of disadvantages is that our pipelines has custom syntax and
>>>>> after formatting they looks a little bit weird, but maybe extending the
>>>>> only configurable option in Black - lines, from 88 to 110 would be 
>>>>> solution.
>>>>> >>>>>
>>>>> >>>>> Second one is that Black requires Python 3 to be run. I don’t
>>>>> know how big obstacle it would be.
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> I believe there are two options how it would be possible to
>>>>> introduce Black. First: just do it, it will hurt but then it would be ok
>>>>> (same as a dentist appointment). Of course it may require some work to
>>>>> adjust linters. On the other hand we can do it gradually and start
>>>>> including sdk parts one by one - maybe it will be less painful?
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> As an example I can share one of projects [2] I know that uses
>>>>> Black (they use also other cool checkers and pre-commit [3]). This is how
>>>>> looks their build with all checks [4].
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> To sum up I believe that if we want improve our coding
>>>>> experience, we should improve our toolset. Black seems be recent and quite
>>>>> popular tool what makes think they won’t stop developing it.
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> [1]
>>>>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> [2]  https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> [3] https://pre-commit.com
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> [4]
>>>>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <
>>>>> [email protected]> wrote:
>>>>> >>>>>>
>>>>> >>>>>> Reformatting to 4 spaces seems a non-starter to me, as it would
>>>>> change nearly every single line in the codebase (and the loss of all
>>>>> context as well as that particular line).
>>>>> >>>>>>
>>>>> >>>>>> This is probably why the 2-space fork exists. However, we don't
>>>>> conform to that either--we use 2 spaces for indentation, but 4 for
>>>>> continuation indentation. (As for the history of this, this goes back to
>>>>> Google's internal style guide, probably motivated by consistency with C++,
>>>>> Java, ... and the fact that with an indent level of 4 one ends up wrapping
>>>>> lines quite frequently (it's telling that black's default line length is
>>>>> 88)). This turns out to be an easy change to the codebase.
>>>>> >>>>>>
>>>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I found that
>>>>> this tool introduces a huge amount of vertical whitespace (e.g. closing
>>>>> parentheses on their own line), e.g.
>>>>> >>>>>>
>>>>> >>>>>> def foo(
>>>>> >>>>>>     args
>>>>> >>>>>> ):
>>>>> >>>>>>   if (
>>>>> >>>>>>       long expression)
>>>>> >>>>>>   ):
>>>>> >>>>>>     func(
>>>>> >>>>>>         args
>>>>> >>>>>>     )
>>>>> >>>>>>
>>>>> >>>>>> I wrote a simple post-processor to put closing parentheses on
>>>>> the same lines, as well as omit the newline after "if (", and disabling
>>>>> formatting of strings, which reduce the churn in our codebase to 15k lines
>>>>> (adding about 4k) out of 200k total.
>>>>> >>>>>>
>>>>> >>>>>> https://github.com/apache/beam/pull/8712/files
>>>>> >>>>>>
>>>>> >>>>>> It's still very opinionated, often in different ways then me,
>>>>> and doesn't understand the semantics of the code, but possibly something 
>>>>> we
>>>>> could live with given the huge advantages of an autoformatter.
>>>>> >>>>>>
>>>>> >>>>>> An intermediate point would be to allow, but not require,
>>>>> autoformatting of changed lines.
>>>>> >>>>>>
>>>>> >>>>>> As for being beta quality, it looks like it's got a decent
>>>>> number of contributors and in my book being in the python github project 
>>>>> is
>>>>> a strong positive signal. But, due to the above issues, I think we'd have
>>>>> to maintain a fork. (The code is pretty lightweight, the 2 vs. 4 space
>>>>> issue is a 2-line change, and the rest implemented as a post-processing
>>>>> step (for now, incomplete), so it'd be easy to stay in sync with 
>>>>> upstream.)
>>>>> >>>>>>
>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <
>>>>> [email protected]> wrote:
>>>>> >>>>>> >
>>>>> >>>>>> > > I think the question is if it can be configured in a way to
>>>>> fit our
>>>>> >>>>>> > > current linter's style. I don't think it is feasible to
>>>>> reformat the
>>>>> >>>>>> > > entire Python SDK.
>>>>> >>>>>> >
>>>>> >>>>>> > It cannot be configured to do what we actually do because
>>>>> Black is
>>>>> >>>>>> > configurable only to support the standard python codestyle
>>>>> guidelines
>>>>> >>>>>> > (PEP-8) which recommends 4 spaces and is what most projects
>>>>> in the
>>>>> >>>>>> > python world use.
>>>>> >>>>>> >
>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git
>>>>> history. This
>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have
>>>>> the feeling
>>>>> >>>>>> > > that this might be less of a problem with Python because
>>>>> the linter has
>>>>> >>>>>> > > more rules than Checkstyle had.
>>>>> >>>>>> >
>>>>> >>>>>> > Yes that’s the bad side effect but there are always tradeoffs
>>>>> we have
>>>>> >>>>>> > to deal with.
>>>>> >>>>>> >
>>>>> >>>>>> >
>>>>> >>>>>> >
>>>>> >>>>>> >
>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <
>>>>> [email protected]> wrote:
>>>>> >>>>>> > >
>>>>> >>>>>> > > I think the question is if it can be configured in a way to
>>>>> fit our
>>>>> >>>>>> > > current linter's style. I don't think it is feasible to
>>>>> reformat the
>>>>> >>>>>> > > entire Python SDK.
>>>>> >>>>>> > >
>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git
>>>>> history. This
>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have
>>>>> the feeling
>>>>> >>>>>> > > that this might be less of a problem with Python because
>>>>> the linter has
>>>>> >>>>>> > > more rules than Checkstyle had.
>>>>> >>>>>> > >
>>>>> >>>>>> > > -Max
>>>>> >>>>>> > >
>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>>>>> >>>>>> > > >> My concerns are:
>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big
>>>>> warning.
>>>>> >>>>>> > > >> - It looks like mostly a single person project. For the
>>>>> same reason I also strongly prefer not using a fork for a specific 
>>>>> setting.
>>>>> Fork will only have less people looking at it.
>>>>> >>>>>> > > >
>>>>> >>>>>> > > > I suppose the project is marked as beta because it is
>>>>> recent, it was
>>>>> >>>>>> > > > presented in 2018’s pycon, and because some things can
>>>>> change since
>>>>> >>>>>> > > > auto-formatters are pretty tricky beasts, I think beta in
>>>>> that case is
>>>>> >>>>>> > > > like our own ‘@Experimental’. If you look at the
>>>>> contribution page [1]
>>>>> >>>>>> > > > you can notice that it is less and less a single person
>>>>> project, there
>>>>> >>>>>> > > > have been 93 independent contributions since the project
>>>>> became
>>>>> >>>>>> > > > public, and the fact that it is hosted in the python
>>>>> organization
>>>>> >>>>>> > > > github [2] gives some confidence on the project
>>>>> continuity.
>>>>> >>>>>> > > >
>>>>> >>>>>> > > > You are right however about the fact that the main author
>>>>> seems to be
>>>>> >>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces issue he
>>>>> can seem
>>>>> >>>>>> > > > arbitrary, but he is just following pep8 style guide
>>>>> recommendations
>>>>> >>>>>> > > > [3]. I am curious of why we (Beam) do not follow the 4
>>>>> spaces
>>>>> >>>>>> > > > recommendation of PEP-8 or even Google's own Python style
>>>>> guide [4],
>>>>> >>>>>> > > > So, probably it should be to us to reconsider the current
>>>>> policy to
>>>>> >>>>>> > > > adapt to the standards (and the tool).
>>>>> >>>>>> > > >
>>>>> >>>>>> > > > I did a quick run of black with python 2.7 compatibility
>>>>> on
>>>>> >>>>>> > > > sdks/python and got only 4 parsing errors which is
>>>>> positive given the
>>>>> >>>>>> > > > size of our code base.
>>>>> >>>>>> > > >
>>>>> >>>>>> > > > 415 files reformatted, 45 files left unchanged, 4 files
>>>>> failed to reformat.
>>>>> >>>>>> > > >
>>>>> >>>>>> > > > error: cannot format
>>>>> >>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>>>>> >>>>>> > > > Cannot parse: 47:22:   _display_progress = print
>>>>> >>>>>> > > > error: cannot format
>>>>> >>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>>>>> >>>>>> > > > Cannot parse: 151:18:               file=sys.stderr)
>>>>> >>>>>> > > > error: cannot format
>>>>> >>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>>>>> >>>>>> > > > Cannot parse: 160:34:       print(traceback_string,
>>>>> file=sys.stderr)
>>>>> >>>>>> > > > error: cannot format
>>>>> >>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>>>>> >>>>>> > > > Cannot parse: 335:51:       print('-->' if pc == last_pc
>>>>> else '    ',
>>>>> >>>>>> > > > end=' ')
>>>>> >>>>>> > > >
>>>>> >>>>>> > > > I still think this can be positive for the project but
>>>>> well I am
>>>>> >>>>>> > > > barely a contributor to the python code base so I let you
>>>>> the python
>>>>> >>>>>> > > > maintainers to reconsider this, in any case it seems like
>>>>> a good
>>>>> >>>>>> > > > improvement for the project.
>>>>> >>>>>> > > >
>>>>> >>>>>> > > > [1] https://github.com/python/black/graphs/contributors
>>>>> >>>>>> > > > [2] https://github.com/python
>>>>> >>>>>> > > > [3] https://www.python.org/dev/peps/pep-0008/#indentation
>>>>> >>>>>> > > > [4]
>>>>> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>>>>> >>>>>> > > >
>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <
>>>>> [email protected]> wrote:
>>>>> >>>>>> > > >>
>>>>> >>>>>> > > >> I am in the same boat with Robert, I am in favor of
>>>>> autoformatters but I am not familiar with this one. My concerns are:
>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big
>>>>> warning.
>>>>> >>>>>> > > >> - It looks like mostly a single person project. For the
>>>>> same reason I also strongly prefer not using a fork for a specific 
>>>>> setting.
>>>>> Fork will only have less people looking at it.
>>>>> >>>>>> > > >>
>>>>> >>>>>> > > >> IMO, this is in an early stage for us. That said lint
>>>>> issues are real as pointed in the thread. If someone would like to give it
>>>>> a try and see how it would look like for us that would be interesting.
>>>>> >>>>>> > > >>
>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna Kucharczyk <
>>>>> [email protected]> wrote:
>>>>> >>>>>> > > >>>
>>>>> >>>>>> > > >>> This sounds really good. A lot of Jenkins jobs failures
>>>>> are caused by lint problems.
>>>>> >>>>>> > > >>> I think it would be great to have something similar to
>>>>> Spotless in Java SDK (I heard there is problem with configuring Black with
>>>>> IntelliJ).
>>>>> >>>>>> > > >>>
>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw <
>>>>> [email protected]> wrote:
>>>>> >>>>>> > > >>>>
>>>>> >>>>>> > > >>>> I'm generally in favor of autoformatters, though I
>>>>> haven't looked at
>>>>> >>>>>> > > >>>> how well this particular one works. We might have to
>>>>> go with
>>>>> >>>>>> > > >>>> https://github.com/desbma/black-2spaces given
>>>>> >>>>>> > > >>>> https://github.com/python/black/issues/378 .
>>>>> >>>>>> > > >>>>
>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada <
>>>>> [email protected]> wrote:
>>>>> >>>>>> > > >>>>>
>>>>> >>>>>> > > >>>>> This looks pretty good:) I know at least a couple
>>>>> people (myself included) who've been annoyed by having to take care of 
>>>>> lint
>>>>> issues that maybe a code formatter could save us.
>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
>>>>> >>>>>> > > >>>>> -P.
>>>>> >>>>>> > > >>>>>
>>>>> >>>>>> > > >>>>>
>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <
>>>>> [email protected]> wrote:
>>>>> >>>>>> > > >>>>>>
>>>>> >>>>>> > > >>>>>> I stumbled by chance into Black [1] a python code
>>>>> auto formatter that
>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for
>>>>> python, and wanted to
>>>>> >>>>>> > > >>>>>> bring to the ML Is there interest from the python
>>>>> people to get this
>>>>> >>>>>> > > >>>>>> into the build?
>>>>> >>>>>> > > >>>>>>
>>>>> >>>>>> > > >>>>>> The introduction of spotless for Java has been a
>>>>> good improvement and
>>>>> >>>>>> > > >>>>>> maybe the python code base may benefit of this too.
>>>>> >>>>>> > > >>>>>>
>>>>> >>>>>> > > >>>>>> WDYT?
>>>>> >>>>>> > > >>>>>>
>>>>> >>>>>> > > >>>>>> [1] https://github.com/python/black
>>>>>
>>>>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [DISCUSS] Autoformat python code with Black

Reply via email to