Huge +1 to autoformatters. Yapf solves my biggest complaint about black, that it can't be customized (and I didn't want to take on maintaining our own fork of black).
It'd be good if there was a way to only apply to violating (or at least changed) lines. On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <[email protected]> wrote: > > +1 to autoformatting > > Let me add some nuance to that. > > The way I see it there are 2 varieties of formatters: those which take the > original formatting into consideration (autopep8) and those which disregard > it (yapf, black). > > I much prefer yapf to black, because you have plenty of options to tweak with > yapf (enough to make the output a pretty close match to the current Beam > style), and you can mark areas to preserve the original formatting, which > could be very useful with Pipeline building with pipe operators. Please > don't pick black. > > autopep8 is more along the lines of spotless in Java -- it only corrects code > that breaks the project's style rules. The big problem with Beam's current > style is that it is so esoteric that autopep8 can't enforce it -- and I'm not > just talking about 2-spaces, which I don't really have a problem with -- the > problem is the use of either 2 or 4 spaces depending on context (expression > start vs hanging indent, etc). This is my *biggest* gripe about the current > style. PyCharm doesn't have enough control either. So, if we can choose a > style that can be expressed by flake8 or pycodestyle then we can use autopep8 > to enforce it. > > I'd prefer autopep8 to yapf because I like having a little wiggle room to > influence the style, but on a big project like Beam all that wiggle room ends > up to minor but noticeable inconsistencies in style throughout the project. > yapf ensures completely consistent style, but the tradeoff is that it's > sometimes ugly, especially in scenarios with similar repeated entries like > argparse, where yapf might insert line breaks in visually inconsistent and > unappealing ways depending on the lengths of the keywords and expressions > involved. > > Either way (but especially if we choose yapf) I think it'd be a nice addition > to setup a pre-commit [1] config so that people can opt in to running > *lightweight* autofixers prior to commit. This will not only reduce dev > frustration but will also reduce the amount of cpu cycles that Jenkins spends > pointing out lint errors. > > [1] https://pre-commit.com/ > > -chad > > > > > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <[email protected]> wrote: >> >> Last time we discussed this there seems not to be much progress into >> autoformatting. >> This tool looks more tweakable, so maybe it could be more appropriate for >> Beam's use case. >> https://github.com/google/yapf/ >> WDYT? >> >> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <[email protected]> wrote: >>> >>> +1 for any autoformatter for Python SDK that does the job. My experience is >>> that since spotless in Java SDK I would never start a new Java project >>> without it. So many great benefits not only for one person coding but for >>> all community. >>> >>> It is a GitHub UI issue that you cannot easily browse past the reformat. It >>> is not actually that hard, but does take a couple extra clicks to get >>> GitHub to display blame before a reformat. It is easier with the command >>> line. I do a lot of code history digging and the global Java reformat is >>> not really a problem. >>> >>> It's actually one more click on Github but I agree it's not the best way to >>> search the history. The most convenient and clear one I've found so far is >>> in Jetbrains IDEs (Intelij) where you can: >>> >>> right click on line number -> "annotate" -> click again -> "annotate >>> previous revision" -> ... >>> >>> You can also use "compare with" to see the diff between two revisions. >>> >>> Łukasz >>> >>> >>> >>> >>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <[email protected]> napisał(a): >>>> >>>> +1 pending good enough tooling (I can't quite tell - seems there are some >>>> issues?) >>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk >>>> <[email protected]> wrote: >>>>> >>>>> What else actually we gain? My guess is faster PR review iteration. We >>>>> will skip some of conversations about code style. >>>> >>>> ... >>>>> >>>>> Last but not least, new contributor may be less discouraged. When I >>>>> started contribute I didn’t know how to format my code and I lost a lot >>>>> of time to add pylint and adjust IntelliJ. I eventually failed. Currently >>>>> I write code intuitively and when I don’t forget I rerun tox. >>>> >>>> >>>> This is a huge benefit. This is why I supported it so much for Java. It is >>>> a community benefit. You do not have to be a contributor to the Python SDK >>>> to support this. That is why I am writing here. Just eliminate all >>>> discussion of formatting. It doesn't really matter what the resulting >>>> format is, if it is not crazy to read. I strongly oppose maintaining a >>>> non-default format. >>>> >>>> Reformating 20k lines or 200k is not hard. The Java global reformat >>>> touched 50k lines. It does not really matter how big it is. Definitely do >>>> it all at once if you think the tool is good enough. And you should pin a >>>> version, so churn is not a problem. You can upgrade the version and >>>> reformat in a PR later and that is also easy. >>>> >>>> It is a GitHub UI issue that you cannot easily browse past the reformat. >>>> It is not actually that hard, but does take a couple extra clicks to get >>>> GitHub to display blame before a reformat. It is easier with the command >>>> line. I do a lot of code history digging and the global Java reformat is >>>> not really a problem. >>>> >>>> Kenn >>>> >>>> >>>>> >>>>> Also everything will be formatted in a same way, so eventually it would >>>>> be easier to read. >>>>> >>>>> Moreover, as it was mentioned in previous emails - a lot of Jenkins >>>>> failures won’t take place, so we save time and resources. >>>>> >>>>> >>>>> One of disadvantages is that our pipelines has custom syntax and after >>>>> formatting they looks a little bit weird, but maybe extending the only >>>>> configurable option in Black - lines, from 88 to 110 would be solution. >>>>> >>>>> Second one is that Black requires Python 3 to be run. I don’t know how >>>>> big obstacle it would be. >>>>> >>>>> >>>>> I believe there are two options how it would be possible to introduce >>>>> Black. First: just do it, it will hurt but then it would be ok (same as a >>>>> dentist appointment). Of course it may require some work to adjust >>>>> linters. On the other hand we can do it gradually and start including sdk >>>>> parts one by one - maybe it will be less painful? >>>>> >>>>> >>>>> As an example I can share one of projects [2] I know that uses Black >>>>> (they use also other cool checkers and pre-commit [3]). This is how looks >>>>> their build with all checks [4]. >>>>> >>>>> >>>>> To sum up I believe that if we want improve our coding experience, we >>>>> should improve our toolset. Black seems be recent and quite popular tool >>>>> what makes think they won’t stop developing it. >>>>> >>>>> >>>>> [1] >>>>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame >>>>> >>>>> >>>>> [2] https://github.com/GoogleCloudPlatform/oozie-to-airflow >>>>> >>>>> >>>>> [3] https://pre-commit.com >>>>> >>>>> >>>>> [4] >>>>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689 >>>>> >>>>> >>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <[email protected]> >>>>> wrote: >>>>>> >>>>>> Reformatting to 4 spaces seems a non-starter to me, as it would change >>>>>> nearly every single line in the codebase (and the loss of all context as >>>>>> well as that particular line). >>>>>> >>>>>> This is probably why the 2-space fork exists. However, we don't conform >>>>>> to that either--we use 2 spaces for indentation, but 4 for continuation >>>>>> indentation. (As for the history of this, this goes back to Google's >>>>>> internal style guide, probably motivated by consistency with C++, Java, >>>>>> ... and the fact that with an indent level of 4 one ends up wrapping >>>>>> lines quite frequently (it's telling that black's default line length is >>>>>> 88)). This turns out to be an easy change to the codebase. >>>>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I found that this tool >>>>>> introduces a huge amount of vertical whitespace (e.g. closing >>>>>> parentheses on their own line), e.g. >>>>>> >>>>>> def foo( >>>>>> args >>>>>> ): >>>>>> if ( >>>>>> long expression) >>>>>> ): >>>>>> func( >>>>>> args >>>>>> ) >>>>>> >>>>>> I wrote a simple post-processor to put closing parentheses on the same >>>>>> lines, as well as omit the newline after "if (", and disabling >>>>>> formatting of strings, which reduce the churn in our codebase to 15k >>>>>> lines (adding about 4k) out of 200k total. >>>>>> >>>>>> https://github.com/apache/beam/pull/8712/files >>>>>> >>>>>> It's still very opinionated, often in different ways then me, and >>>>>> doesn't understand the semantics of the code, but possibly something we >>>>>> could live with given the huge advantages of an autoformatter. >>>>>> >>>>>> An intermediate point would be to allow, but not require, autoformatting >>>>>> of changed lines. >>>>>> >>>>>> As for being beta quality, it looks like it's got a decent number of >>>>>> contributors and in my book being in the python github project is a >>>>>> strong positive signal. But, due to the above issues, I think we'd have >>>>>> to maintain a fork. (The code is pretty lightweight, the 2 vs. 4 space >>>>>> issue is a 2-line change, and the rest implemented as a post-processing >>>>>> step (for now, incomplete), so it'd be easy to stay in sync with >>>>>> upstream.) >>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <[email protected]> wrote: >>>>>> > >>>>>> > > I think the question is if it can be configured in a way to fit our >>>>>> > > current linter's style. I don't think it is feasible to reformat the >>>>>> > > entire Python SDK. >>>>>> > >>>>>> > It cannot be configured to do what we actually do because Black is >>>>>> > configurable only to support the standard python codestyle guidelines >>>>>> > (PEP-8) which recommends 4 spaces and is what most projects in the >>>>>> > python world use. >>>>>> > >>>>>> > > Reformatted lines don't allow quick access to the Git history. This >>>>>> > > effect is still visible in the Java SDK. However, I have the feeling >>>>>> > > that this might be less of a problem with Python because the linter >>>>>> > > has >>>>>> > > more rules than Checkstyle had. >>>>>> > >>>>>> > Yes that’s the bad side effect but there are always tradeoffs we have >>>>>> > to deal with. >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <[email protected]> >>>>>> > wrote: >>>>>> > > >>>>>> > > I think the question is if it can be configured in a way to fit our >>>>>> > > current linter's style. I don't think it is feasible to reformat the >>>>>> > > entire Python SDK. >>>>>> > > >>>>>> > > Reformatted lines don't allow quick access to the Git history. This >>>>>> > > effect is still visible in the Java SDK. However, I have the feeling >>>>>> > > that this might be less of a problem with Python because the linter >>>>>> > > has >>>>>> > > more rules than Checkstyle had. >>>>>> > > >>>>>> > > -Max >>>>>> > > >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote: >>>>>> > > >> My concerns are: >>>>>> > > >> - The product is clearly marked as beta with a big warning. >>>>>> > > >> - It looks like mostly a single person project. For the same >>>>>> > > >> reason I also strongly prefer not using a fork for a specific >>>>>> > > >> setting. Fork will only have less people looking at it. >>>>>> > > > >>>>>> > > > I suppose the project is marked as beta because it is recent, it >>>>>> > > > was >>>>>> > > > presented in 2018’s pycon, and because some things can change since >>>>>> > > > auto-formatters are pretty tricky beasts, I think beta in that >>>>>> > > > case is >>>>>> > > > like our own ‘@Experimental’. If you look at the contribution page >>>>>> > > > [1] >>>>>> > > > you can notice that it is less and less a single person project, >>>>>> > > > there >>>>>> > > > have been 93 independent contributions since the project became >>>>>> > > > public, and the fact that it is hosted in the python organization >>>>>> > > > github [2] gives some confidence on the project continuity. >>>>>> > > > >>>>>> > > > You are right however about the fact that the main author seems to >>>>>> > > > be >>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces issue he can seem >>>>>> > > > arbitrary, but he is just following pep8 style guide >>>>>> > > > recommendations >>>>>> > > > [3]. I am curious of why we (Beam) do not follow the 4 spaces >>>>>> > > > recommendation of PEP-8 or even Google's own Python style guide >>>>>> > > > [4], >>>>>> > > > So, probably it should be to us to reconsider the current policy to >>>>>> > > > adapt to the standards (and the tool). >>>>>> > > > >>>>>> > > > I did a quick run of black with python 2.7 compatibility on >>>>>> > > > sdks/python and got only 4 parsing errors which is positive given >>>>>> > > > the >>>>>> > > > size of our code base. >>>>>> > > > >>>>>> > > > 415 files reformatted, 45 files left unchanged, 4 files failed to >>>>>> > > > reformat. >>>>>> > > > >>>>>> > > > error: cannot format >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py: >>>>>> > > > Cannot parse: 47:22: _display_progress = print >>>>>> > > > error: cannot format >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py: >>>>>> > > > Cannot parse: 151:18: file=sys.stderr) >>>>>> > > > error: cannot format >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py: >>>>>> > > > Cannot parse: 160:34: print(traceback_string, >>>>>> > > > file=sys.stderr) >>>>>> > > > error: cannot format >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py: >>>>>> > > > Cannot parse: 335:51: print('-->' if pc == last_pc else ' >>>>>> > > > ', >>>>>> > > > end=' ') >>>>>> > > > >>>>>> > > > I still think this can be positive for the project but well I am >>>>>> > > > barely a contributor to the python code base so I let you the >>>>>> > > > python >>>>>> > > > maintainers to reconsider this, in any case it seems like a good >>>>>> > > > improvement for the project. >>>>>> > > > >>>>>> > > > [1] https://github.com/python/black/graphs/contributors >>>>>> > > > [2] https://github.com/python >>>>>> > > > [3] https://www.python.org/dev/peps/pep-0008/#indentation >>>>>> > > > [4] >>>>>> > > > https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation >>>>>> > > > >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <[email protected]> >>>>>> > > > wrote: >>>>>> > > >> >>>>>> > > >> I am in the same boat with Robert, I am in favor of >>>>>> > > >> autoformatters but I am not familiar with this one. My concerns >>>>>> > > >> are: >>>>>> > > >> - The product is clearly marked as beta with a big warning. >>>>>> > > >> - It looks like mostly a single person project. For the same >>>>>> > > >> reason I also strongly prefer not using a fork for a specific >>>>>> > > >> setting. Fork will only have less people looking at it. >>>>>> > > >> >>>>>> > > >> IMO, this is in an early stage for us. That said lint issues are >>>>>> > > >> real as pointed in the thread. If someone would like to give it a >>>>>> > > >> try and see how it would look like for us that would be >>>>>> > > >> interesting. >>>>>> > > >> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna Kucharczyk >>>>>> > > >> <[email protected]> wrote: >>>>>> > > >>> >>>>>> > > >>> This sounds really good. A lot of Jenkins jobs failures are >>>>>> > > >>> caused by lint problems. >>>>>> > > >>> I think it would be great to have something similar to Spotless >>>>>> > > >>> in Java SDK (I heard there is problem with configuring Black >>>>>> > > >>> with IntelliJ). >>>>>> > > >>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw >>>>>> > > >>> <[email protected]> wrote: >>>>>> > > >>>> >>>>>> > > >>>> I'm generally in favor of autoformatters, though I haven't >>>>>> > > >>>> looked at >>>>>> > > >>>> how well this particular one works. We might have to go with >>>>>> > > >>>> https://github.com/desbma/black-2spaces given >>>>>> > > >>>> https://github.com/python/black/issues/378 . >>>>>> > > >>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada >>>>>> > > >>>> <[email protected]> wrote: >>>>>> > > >>>>> >>>>>> > > >>>>> This looks pretty good:) I know at least a couple people >>>>>> > > >>>>> (myself included) who've been annoyed by having to take care >>>>>> > > >>>>> of lint issues that maybe a code formatter could save us. >>>>>> > > >>>>> Thanks for sharing Ismael. >>>>>> > > >>>>> -P. >>>>>> > > >>>>> >>>>>> > > >>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía >>>>>> > > >>>>> <[email protected]> wrote: >>>>>> > > >>>>>> >>>>>> > > >>>>>> I stumbled by chance into Black [1] a python code auto >>>>>> > > >>>>>> formatter that >>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for python, and >>>>>> > > >>>>>> wanted to >>>>>> > > >>>>>> bring to the ML Is there interest from the python people to >>>>>> > > >>>>>> get this >>>>>> > > >>>>>> into the build? >>>>>> > > >>>>>> >>>>>> > > >>>>>> The introduction of spotless for Java has been a good >>>>>> > > >>>>>> improvement and >>>>>> > > >>>>>> maybe the python code base may benefit of this too. >>>>>> > > >>>>>> >>>>>> > > >>>>>> WDYT? >>>>>> > > >>>>>> >>>>>> > > >>>>>> [1] https://github.com/python/black
