+1 to autoformatting

Let me add some nuance to that.

The way I see it there are 2 varieties of formatters:  those which take the
original formatting into consideration (autopep8) and those which disregard
it (yapf, black).

I much prefer yapf to black, because you have plenty of options to tweak
with yapf (enough to make the output a pretty close match to the current
Beam style), and you can mark areas to preserve the original formatting,
which could be very useful with Pipeline building with pipe operators.
Please don't pick black.

autopep8 is more along the lines of spotless in Java -- it only corrects
code that breaks the project's style rules.  The big problem with Beam's
current style is that it is so esoteric that autopep8 can't enforce it --
and I'm not just talking about 2-spaces, which I don't really have a
problem with -- the problem is the use of either 2 or 4 spaces depending on
context (expression start vs hanging indent, etc).  This is my *biggest*
gripe about the current style.  PyCharm doesn't have enough control
either.  So, if we can choose a style that can be expressed by flake8 or
pycodestyle then we can use autopep8 to enforce it.

I'd prefer autopep8 to yapf because I like having a little wiggle room to
influence the style, but on a big project like Beam all that wiggle room
ends up to minor but noticeable inconsistencies in style throughout the
project.  yapf ensures completely consistent style, but the tradeoff is
that it's sometimes ugly, especially in scenarios with similar repeated
entries like argparse, where yapf might insert line breaks in visually
inconsistent and unappealing ways depending on the lengths of the keywords
and expressions involved.

Either way (but especially if we choose yapf) I think it'd be a nice
addition to setup a pre-commit [1] config so that people can opt in to
running *lightweight* autofixers prior to commit.  This will not only
reduce dev frustration but will also reduce the amount of cpu cycles that
Jenkins spends pointing out lint errors.

[1] https://pre-commit.com/

-chad




On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <[email protected]> wrote:

> Last time we discussed this there seems not to be much progress into
> autoformatting.
> This tool looks more tweakable, so maybe it could be more appropriate for
> Beam's use case.
> https://github.com/google/yapf/
> WDYT?
>
>
> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <[email protected]> wrote:
>
>> +1 for any autoformatter for Python SDK that does the job. My experience
>> is that since spotless in Java SDK I would never start a new Java project
>> without it. So many great benefits not only for one person coding but for
>> all community.
>>
>> It is a GitHub UI issue that you cannot easily browse past the reformat.
>> It is not actually that hard, but does take a couple extra clicks to get
>> GitHub to display blame before a reformat. It is easier with the command
>> line. I do a lot of code history digging and the global Java reformat is
>> not really a problem.
>>
>> It's actually one more click on Github but I agree it's not the best way
>> to search the history. The most convenient and clear one I've found so far
>> is in Jetbrains IDEs (Intelij) where you can:
>>
>> right click on line number -> "annotate" -> click again -> "annotate
>> previous revision" -> ...
>>
>> You can also use "compare with" to see the diff between two revisions.
>>
>> Łukasz
>>
>>
>>
>>
>>
>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <[email protected]> napisał(a):
>>
>>> +1 pending good enough tooling (I can't quite tell - seems there are
>>> some issues?)
>>>
>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <
>>> [email protected]> wrote:
>>>
>>>> What else actually we gain? My guess is faster PR review iteration. We
>>>> will skip some of conversations about code style.
>>>>
>>> ...
>>>
>>>> Last but not least, new contributor may be less discouraged. When I
>>>> started contribute I didn’t know how to format my code and I lost a lot of
>>>> time to add pylint and adjust IntelliJ. I eventually failed. Currently I
>>>> write code intuitively and when I don’t forget I rerun tox.
>>>>
>>>
>>> This is a huge benefit. This is why I supported it so much for Java. It
>>> is a community benefit. You do not have to be a contributor to the Python
>>> SDK to support this. That is why I am writing here. Just eliminate all
>>> discussion of formatting. It doesn't really matter what the resulting
>>> format is, if it is not crazy to read. I strongly oppose maintaining a
>>> non-default format.
>>>
>>> Reformating 20k lines or 200k is not hard. The Java global reformat
>>> touched 50k lines. It does not really matter how big it is. Definitely do
>>> it all at once if you think the tool is good enough. And you should pin a
>>> version, so churn is not a problem. You can upgrade the version and
>>> reformat in a PR later and that is also easy.
>>>
>>> It is a GitHub UI issue that you cannot easily browse past the reformat.
>>> It is not actually that hard, but does take a couple extra clicks to get
>>> GitHub to display blame before a reformat. It is easier with the command
>>> line. I do a lot of code history digging and the global Java reformat is
>>> not really a problem.
>>>
>>> Kenn
>>>
>>>
>>>
>>>> Also everything will be formatted in a same way, so eventually it would
>>>> be easier to read.
>>>>
>>>> Moreover, as it was mentioned in previous emails - a lot of Jenkins
>>>> failures won’t take place, so we save time and resources.
>>>>
>>>>
>>>> One of disadvantages is that our pipelines has custom syntax and after
>>>> formatting they looks a little bit weird, but maybe extending the only
>>>> configurable option in Black - lines, from 88 to 110 would be solution.
>>>>
>>>> Second one is that Black requires Python 3 to be run. I don’t know how
>>>> big obstacle it would be.
>>>>
>>>> I believe there are two options how it would be possible to introduce
>>>> Black. First: just do it, it will hurt but then it would be ok (same as a
>>>> dentist appointment). Of course it may require some work to adjust linters.
>>>> On the other hand we can do it gradually and start including sdk parts one
>>>> by one - maybe it will be less painful?
>>>>
>>>> As an example I can share one of projects [2] I know that uses Black
>>>> (they use also other cool checkers and pre-commit [3]). This is how looks
>>>> their build with all checks [4].
>>>>
>>>> To sum up I believe that if we want improve our coding experience, we
>>>> should improve our toolset. Black seems be recent and quite popular tool
>>>> what makes think they won’t stop developing it.
>>>>
>>>> [1]
>>>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>>>
>>>> [2]  https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>>>
>>>> [3] https://pre-commit.com
>>>>
>>>> [4]
>>>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>>>
>>>>
>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <[email protected]>
>>>> wrote:
>>>>
>>>>> Reformatting to 4 spaces seems a non-starter to me, as it would change
>>>>> nearly every single line in the codebase (and the loss of all context as
>>>>> well as that particular line).
>>>>>
>>>>> This is probably why the 2-space fork exists. However, we don't
>>>>> conform to that either--we use 2 spaces for indentation, but 4 for
>>>>> continuation indentation. (As for the history of this, this goes back to
>>>>> Google's internal style guide, probably motivated by consistency with C++,
>>>>> Java, ... and the fact that with an indent level of 4 one ends up wrapping
>>>>> lines quite frequently (it's telling that black's default line length is
>>>>> 88)). This turns out to be an easy change to the codebase.
>>>>>
>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I found that this
>>>>> tool introduces a huge amount of vertical whitespace (e.g. closing
>>>>> parentheses on their own line), e.g.
>>>>>
>>>>> def foo(
>>>>>     args
>>>>> ):
>>>>>   if (
>>>>>       long expression)
>>>>>   ):
>>>>>     func(
>>>>>         args
>>>>>     )
>>>>>
>>>>> I wrote a simple post-processor to put closing parentheses on the same
>>>>> lines, as well as omit the newline after "if (", and disabling formatting
>>>>> of strings, which reduce the churn in our codebase to 15k lines (adding
>>>>> about 4k) out of 200k total.
>>>>>
>>>>> https://github.com/apache/beam/pull/8712/files
>>>>>
>>>>> It's still very opinionated, often in different ways then me, and
>>>>> doesn't understand the semantics of the code, but possibly something we
>>>>> could live with given the huge advantages of an autoformatter.
>>>>>
>>>>> An intermediate point would be to allow, but not require,
>>>>> autoformatting of changed lines.
>>>>>
>>>>> As for being beta quality, it looks like it's got a decent number of
>>>>> contributors and in my book being in the python github project is a strong
>>>>> positive signal. But, due to the above issues, I think we'd have to
>>>>> maintain a fork. (The code is pretty lightweight, the 2 vs. 4 space issue
>>>>> is a 2-line change, and the rest implemented as a post-processing step 
>>>>> (for
>>>>> now, incomplete), so it'd be easy to stay in sync with upstream.)
>>>>>
>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <[email protected]>
>>>>> wrote:
>>>>> >
>>>>> > > I think the question is if it can be configured in a way to fit our
>>>>> > > current linter's style. I don't think it is feasible to reformat
>>>>> the
>>>>> > > entire Python SDK.
>>>>> >
>>>>> > It cannot be configured to do what we actually do because Black is
>>>>> > configurable only to support the standard python codestyle guidelines
>>>>> > (PEP-8) which recommends 4 spaces and is what most projects in the
>>>>> > python world use.
>>>>> >
>>>>> > > Reformatted lines don't allow quick access to the Git history. This
>>>>> > > effect is still visible in the Java SDK. However, I have the
>>>>> feeling
>>>>> > > that this might be less of a problem with Python because the
>>>>> linter has
>>>>> > > more rules than Checkstyle had.
>>>>> >
>>>>> > Yes that’s the bad side effect but there are always tradeoffs we have
>>>>> > to deal with.
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <[email protected]>
>>>>> wrote:
>>>>> > >
>>>>> > > I think the question is if it can be configured in a way to fit our
>>>>> > > current linter's style. I don't think it is feasible to reformat
>>>>> the
>>>>> > > entire Python SDK.
>>>>> > >
>>>>> > > Reformatted lines don't allow quick access to the Git history. This
>>>>> > > effect is still visible in the Java SDK. However, I have the
>>>>> feeling
>>>>> > > that this might be less of a problem with Python because the
>>>>> linter has
>>>>> > > more rules than Checkstyle had.
>>>>> > >
>>>>> > > -Max
>>>>> > >
>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>>>>> > > >> My concerns are:
>>>>> > > >> - The product is clearly marked as beta with a big warning.
>>>>> > > >> - It looks like mostly a single person project. For the same
>>>>> reason I also strongly prefer not using a fork for a specific setting. 
>>>>> Fork
>>>>> will only have less people looking at it.
>>>>> > > >
>>>>> > > > I suppose the project is marked as beta because it is recent, it
>>>>> was
>>>>> > > > presented in 2018’s pycon, and because some things can change
>>>>> since
>>>>> > > > auto-formatters are pretty tricky beasts, I think beta in that
>>>>> case is
>>>>> > > > like our own ‘@Experimental’. If you look at the contribution
>>>>> page [1]
>>>>> > > > you can notice that it is less and less a single person project,
>>>>> there
>>>>> > > > have been 93 independent contributions since the project became
>>>>> > > > public, and the fact that it is hosted in the python organization
>>>>> > > > github [2] gives some confidence on the project continuity.
>>>>> > > >
>>>>> > > > You are right however about the fact that the main author seems
>>>>> to be
>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces issue he can seem
>>>>> > > > arbitrary, but he is just following pep8 style guide
>>>>> recommendations
>>>>> > > > [3]. I am curious of why we (Beam) do not follow the 4 spaces
>>>>> > > > recommendation of PEP-8 or even Google's own Python style guide
>>>>> [4],
>>>>> > > > So, probably it should be to us to reconsider the current policy
>>>>> to
>>>>> > > > adapt to the standards (and the tool).
>>>>> > > >
>>>>> > > > I did a quick run of black with python 2.7 compatibility on
>>>>> > > > sdks/python and got only 4 parsing errors which is positive
>>>>> given the
>>>>> > > > size of our code base.
>>>>> > > >
>>>>> > > > 415 files reformatted, 45 files left unchanged, 4 files failed
>>>>> to reformat.
>>>>> > > >
>>>>> > > > error: cannot format
>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>>>>> > > > Cannot parse: 47:22:   _display_progress = print
>>>>> > > > error: cannot format
>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>>>>> > > > Cannot parse: 151:18:               file=sys.stderr)
>>>>> > > > error: cannot format
>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>>>>> > > > Cannot parse: 160:34:       print(traceback_string,
>>>>> file=sys.stderr)
>>>>> > > > error: cannot format
>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>>>>> > > > Cannot parse: 335:51:       print('-->' if pc == last_pc else '
>>>>>    ',
>>>>> > > > end=' ')
>>>>> > > >
>>>>> > > > I still think this can be positive for the project but well I am
>>>>> > > > barely a contributor to the python code base so I let you the
>>>>> python
>>>>> > > > maintainers to reconsider this, in any case it seems like a good
>>>>> > > > improvement for the project.
>>>>> > > >
>>>>> > > > [1] https://github.com/python/black/graphs/contributors
>>>>> > > > [2] https://github.com/python
>>>>> > > > [3] https://www.python.org/dev/peps/pep-0008/#indentation
>>>>> > > > [4]
>>>>> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>>>>> > > >
>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <[email protected]>
>>>>> wrote:
>>>>> > > >>
>>>>> > > >> I am in the same boat with Robert, I am in favor of
>>>>> autoformatters but I am not familiar with this one. My concerns are:
>>>>> > > >> - The product is clearly marked as beta with a big warning.
>>>>> > > >> - It looks like mostly a single person project. For the same
>>>>> reason I also strongly prefer not using a fork for a specific setting. 
>>>>> Fork
>>>>> will only have less people looking at it.
>>>>> > > >>
>>>>> > > >> IMO, this is in an early stage for us. That said lint issues
>>>>> are real as pointed in the thread. If someone would like to give it a try
>>>>> and see how it would look like for us that would be interesting.
>>>>> > > >>
>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna Kucharczyk <
>>>>> [email protected]> wrote:
>>>>> > > >>>
>>>>> > > >>> This sounds really good. A lot of Jenkins jobs failures are
>>>>> caused by lint problems.
>>>>> > > >>> I think it would be great to have something similar to
>>>>> Spotless in Java SDK (I heard there is problem with configuring Black with
>>>>> IntelliJ).
>>>>> > > >>>
>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw <
>>>>> [email protected]> wrote:
>>>>> > > >>>>
>>>>> > > >>>> I'm generally in favor of autoformatters, though I haven't
>>>>> looked at
>>>>> > > >>>> how well this particular one works. We might have to go with
>>>>> > > >>>> https://github.com/desbma/black-2spaces given
>>>>> > > >>>> https://github.com/python/black/issues/378 .
>>>>> > > >>>>
>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada <
>>>>> [email protected]> wrote:
>>>>> > > >>>>>
>>>>> > > >>>>> This looks pretty good:) I know at least a couple people
>>>>> (myself included) who've been annoyed by having to take care of lint 
>>>>> issues
>>>>> that maybe a code formatter could save us.
>>>>> > > >>>>> Thanks for sharing Ismael.
>>>>> > > >>>>> -P.
>>>>> > > >>>>>
>>>>> > > >>>>>
>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <
>>>>> [email protected]> wrote:
>>>>> > > >>>>>>
>>>>> > > >>>>>> I stumbled by chance into Black [1] a python code auto
>>>>> formatter that
>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for python, and
>>>>> wanted to
>>>>> > > >>>>>> bring to the ML Is there interest from the python people to
>>>>> get this
>>>>> > > >>>>>> into the build?
>>>>> > > >>>>>>
>>>>> > > >>>>>> The introduction of spotless for Java has been a good
>>>>> improvement and
>>>>> > > >>>>>> maybe the python code base may benefit of this too.
>>>>> > > >>>>>>
>>>>> > > >>>>>> WDYT?
>>>>> > > >>>>>>
>>>>> > > >>>>>> [1] https://github.com/python/black
>>>>>
>>>>

Reply via email to