Probably worthwhile to really think about the long-term desired outcomes. Even a large, one-time, disruption of the codebase for the sake of future consistency could be quite valuable. Remember you'll also lose fidelity with history/blame. At some point a 'breaking' ( in a different sense ) change might be useful rather than anchoring on what exists.
On Fri, Jul 29, 2022 at 8:59 AM Anton Okolnychyi <aokolnyc...@apple.com.invalid> wrote: > On a side note, having automatic formatting was really nice. I think it is > the right direction. If we only had a way to tweak a few rules like the max > line length, one-line Javadoc and maybe consider using the Google format > for 1.8 cause we are using the 1.7 format at the moment, which was not > designed for lambdas. That way we may have all the benefits of automatic > formatting with less disruption to the code. > > - Anton > > On Jul 28, 2022, at 7:35 PM, Anton Okolnychyi < > aokolnyc...@apple.com.INVALID> wrote: > > I had a chance to rebase one of my PRs. It is really hard to justify the > new max line length of 100 chars. I think the most unreadable code is > usually around places when single statements are split into multiple lines. > Since Java has explicit types and we prefer full and elaborate class/var > names, the new line limit alone triggered so many changes and requires > splitting a lot of statements that would previously fit on a single line. > > The max line length is one of the reasons why Palantir folks forked the > Google format. > > https://github.com/palantir/palantir-java-format > > - Anton > > > On Jul 28, 2022, at 7:42 AM, Eduard Tudenhoefner <edu...@tabular.io> > wrote: > > Thanks for your honest feedback Anton, which I really appreciate. I agree > with the points you brought up. > > I don't prefer the Google Java Format either, but it is at least something > that produces consistent results across tools. > If you ask me, I would love to customize the code format and have it as > close as possible to what the Iceberg project had. But one of the reasons > why the format isn't customizable is because it is difficult to support > such customizations across tools. > > The problem with code format is that there is no single format that can > satisfy the preferences of everybody. However, from my experience, once > people start to use *any* code format that produces consistent results > across *Eclipse/IntelliJ/cmd line*, people stop worrying about code > format details. > > This is also one of the reasons why the creators of Go decided to have a > code formatter built-in (https://go.dev/doc/effective_go#formatting). > > In the long run I think the decision to use Spotless + code formatting > helps, especially for new people that want to contribute and don't want to > feel frustrated due to review feedback around code style. > > Eduard > > On Thu, Jul 28, 2022 at 6:15 AM Anton Okolnychyi <aokolnyc...@apple.com> > wrote: > >> I am late to the discussion, but I would like to express my opinion. >> >> As someone who deeply cares about the code consistency and leaves a lot >> of style-related nits, I am definitely happy to see efforts to simplify the >> life of the reviewers and automate the formatting process. Explaining the >> code style to new contributors consumes a lot of time and results in more >> iterations on PRs. Automating this would be great! >> >> However, I am concerned regarding the changes merged in PR #5312. >> >> - Many changes that went in that PR were against the original code style >> of the project that we, committers and contributors, tried to maintain for >> all these years. I think we had a lot of well-structured and consistent >> places, which attracted devs to Iceberg. >> - I feel the code quality dropped as some places were written in the old >> style and now look really weird. If we followed the new style originally, >> maybe, it would look much better as we would structure the code differently. >> - As a someone whose team maintains an internal fork that is very close >> to master (I know we are not the only ones), I feel such a massive change >> will be a challenge to cherry-pick. I am not convinced introducing >> automatic formatting had to be so radical. >> >> The first two points concern me the most. Since I am late to the >> discussion, it would not be fair for me to request changes. However, I >> strongly suggest the community to reconsider the approach we took while we >> haven’t merged many changes on top of PR #5312. >> >> To sum up, I definitely support automating the formatting to simplify >> reviews and help new contributors but I would try to come with a way to >> make it less radical and follow whatever we had all these years. We did >> discuss some of the formatting guidelines a few years ago and it is all >> different now. >> >> - Anton >> >> On Jul 27, 2022, at 11:40 AM, Steven Wu <stevenz...@gmail.com> wrote: >> >> Eduard, thank you for driving this effort! Great work! >> >> On Wed, Jul 27, 2022 at 11:11 AM Eduard Tudenhoefner <edu...@tabular.io> >> wrote: >> >>> Hey everyone, >>> >>> Google Java Format + Spotless integration have been merged as part of >>> https://github.com/apache/iceberg/pull/5266. >>> >>> The big bang reformatting (aka *spotlessApply*) will happen as part of >>> https://github.com/apache/iceberg/pull/5312 and once this PR is merged, >>> please perform the following steps if you have open PRs: >>> >>> 1. Rebase your PR branch against the commit before the „Big Bang“ >>> 2. Run `*./gradlew spotlessApply*` >>> 3. Squash all commits >>> 4. Rebase against the latest *HEAD* of the master branch >>> 5. Force-push your branch >>> >>> This should hopefully produce a minimum of conflicts on PRs. >>> >>> Additionally, make sure to install the *google-java-format* plugin for >>> your IDE as outlined in https://github.com/apache/iceberg-docs/pull/125 >>> so that you can format code inside the IDE as well. >>> >>> Eduard >>> >>> On Fri, Jul 15, 2022 at 12:45 PM Eduard Tudenhoefner <edu...@tabular.io> >>> wrote: >>> >>>> That's correct Ryan. The Google style is what can be applied across >>>> tools and Spotless is the tool that we use to check and *apply* that >>>> style (and also auto-apply the license header). The *apply* part is >>>> the important piece that Checkstyle for example is missing (as it's tedious >>>> to manually apply what *checkstyle:check* complains about). >>>> >>>> Additionally, Spotless >>>> <https://github.com/diffplug/spotless/tree/main/plugin-gradle> supports >>>> a bunch of other languages, so it would even be possible to use it for >>>> auto-formatting Gradle files (or Scala files using scalafmt) if we wanted >>>> to, but for now I would probably focus on the Java codebase. >>>> >>>> Eduard >>>> >>>> On Thu, Jul 14, 2022 at 6:58 PM Ryan Blue <b...@tabular.io> wrote: >>>> >>>>> Okay, I understand. So it isn't that spotless is supported across >>>>> tools, it is that google style is supported and we can use Spotless to >>>>> both >>>>> check and apply that style. Is that correct? >>>>> >>>>> On Thu, Jul 14, 2022 at 9:56 AM Eduard Tudenhoefner <edu...@tabular.io> >>>>> wrote: >>>>> >>>>>> Yeah it's how Dmitri said, the Google style is most likely on purpose >>>>>> not configurable, because otherwise it's getting more difficult to >>>>>> achieve >>>>>> the same formatting results across IDEs & CMD line. >>>>>> >>>>>> >>>>>> On Wed, Jul 13, 2022 at 7:39 PM Dmitri Bourlatchkov < >>>>>> dmitri.bourlatch...@dremio.com> wrote: >>>>>> >>>>>>> Yeah, the Google style has very specific wrapping limits and javadoc >>>>>>> format. It is not configurable, if we go with it. >>>>>>> >>>>>>> From my POV, using another style would be fine. However, with a >>>>>>> custom style it can be hard and tricky to ensure that all IDEs enforce >>>>>>> it >>>>>>> the same way as spotless. >>>>>>> >>>>>>> The Google style has a plugin at least for IntelliJ (and likely for >>>>>>> Eclipse too) so it will behave the same way in IDE and cmd. line build. >>>>>>> >>>>>>> On Wed, Jul 13, 2022 at 1:28 PM Ryan Blue <b...@tabular.io> wrote: >>>>>>> >>>>>>>> Is there any value in trying to get Spotless to match the current >>>>>>>> format as closely as possible? It would be great to have fewer changes. >>>>>>>> >>>>>>>> For example, when I tested the apply, I see this: >>>>>>>> >>>>>>>> - /** >>>>>>>> - * Returns an initialized {@link AliyunProperties} >>>>>>>> - */ >>>>>>>> + /** Returns an initialized {@link AliyunProperties} */ >>>>>>>> >>>>>>>> Changes like that don’t seem very valuable to me. Similarly, it >>>>>>>> looks like a lot of text is wrapped at a different line length. If we >>>>>>>> can >>>>>>>> alter that, we could easily avoid a lot of changes. >>>>>>>> >>>>>>>> Ryan >>>>>>>> >>>>>>>> On Wed, Jul 13, 2022 at 7:41 AM Eduard Tudenhoefner < >>>>>>>> edu...@tabular.io> wrote: >>>>>>>> >>>>>>>>> As a first step, I created >>>>>>>>> https://github.com/apache/iceberg/pull/5266 which configures >>>>>>>>> Spotless to use the Google Java Format and also apply the correct >>>>>>>>> copyright >>>>>>>>> header for Java files. >>>>>>>>> >>>>>>>>> Once this PR is merged, the next steps would be: >>>>>>>>> >>>>>>>>> - removing conflicting Checkstyle rules that are not in line >>>>>>>>> with the Google format >>>>>>>>> - formatting the entire code base via `*./gradlew >>>>>>>>> spotlessApply*` >>>>>>>>> - setting `*enforceCheck*` to `*true*` in >>>>>>>>> >>>>>>>>> https://github.com/apache/iceberg/blob/80318d8cfbeb0d96d0afc27c84bc3dbddde35344/baseline.gradle#L48 >>>>>>>>> so that validation fails if code isn't properly formatted >>>>>>>>> - updating docs around the current Formatter usage and how to >>>>>>>>> configure Eclipse/IntelliJ >>>>>>>>> >>>>>>>>> The first 3 steps should be done together as part of the big bang. >>>>>>>>> >>>>>>>>> Eduard >>>>>>>>> >>>>>>>>> On Mon, Jul 11, 2022 at 10:54 PM Ryan Blue <b...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Okay, it sounds like there's mostly agreement for going with >>>>>>>>>> spotless. Let's try that out. We'll work on some changes to add >>>>>>>>>> spotless so >>>>>>>>>> that `spotlessApply` works. Then we can do the big bang migration >>>>>>>>>> (which I >>>>>>>>>> also agree is the best option) just before the 1.0. >>>>>>>>>> >>>>>>>>>> Thanks, everyone! >>>>>>>>>> >>>>>>>>>> On Mon, Jul 11, 2022 at 11:50 AM Dmitri Bourlatchkov < >>>>>>>>>> dmitri.bourlatch...@dremio.com> wrote: >>>>>>>>>> >>>>>>>>>>> My experience with the Google Code Style + Spotless was positive >>>>>>>>>>> too. >>>>>>>>>>> >>>>>>>>>>> I'd be fine with another code style as long as it is >>>>>>>>>>> "deterministic" (e.g. does not make changes on repeated execution) >>>>>>>>>>> and >>>>>>>>>>> works in IntelliJ IDEA / Eclipse / etc. >>>>>>>>>>> >>>>>>>>>>> Regarding cherry-picking into older branches, I think Robert's >>>>>>>>>>> suggestion can be tweaked slightly to be helpful there too: >>>>>>>>>>> >>>>>>>>>>> 1. Checkout old branch >>>>>>>>>>> 2. Apply the new style (run gradle ...) >>>>>>>>>>> 3. Cherry-pick without committing >>>>>>>>>>> 4. Manually revert to old style >>>>>>>>>>> 5. Commit >>>>>>>>>>> 6. Reset to original branch HEAD >>>>>>>>>>> 7. Cherry pick commit 5 again >>>>>>>>>>> >>>>>>>>>>> It's a bit lengthy and may be a tedious process, but it should >>>>>>>>>>> allow applying the git-level changes mostly automatically. >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Dmitri. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Jul 8, 2022 at 2:53 AM Robert Stupp <sn...@snazy.de> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> From my experience, it’s a big win to have automatic code >>>>>>>>>>>> formatting. >>>>>>>>>>>> >>>>>>>>>>>> In projectnessie we use automatic code formatting for all >>>>>>>>>>>> languages and haven’t serious issues with Spotless. It is just >>>>>>>>>>>> nice to not >>>>>>>>>>>> have to bike shed about whitespaces, line breaks, brackets, etc. >>>>>>>>>>>> It was a >>>>>>>>>>>> bit of discussion, because people had bad memories from past >>>>>>>>>>>> experiences >>>>>>>>>>>> with automatic code formatting breaking code and introducing >>>>>>>>>>>> subtle bugs. >>>>>>>>>>>> >>>>>>>>>>>> I think that using code styles that „do not allow bike >>>>>>>>>>>> shedding“ (Google Code Style) are a very good option. >>>>>>>>>>>> >>>>>>>>>>>> So far none of us has seen issues with any of the Spotless code >>>>>>>>>>>> formatters that we use: XML, Kotlin/Gradle, Kotlin, Antlr4, Java, >>>>>>>>>>>> Scala - >>>>>>>>>>>> relying on the „standard“ settings w/o any customizations. We use >>>>>>>>>>>> this >>>>>>>>>>>> piece of code, externalized into an internal Gradle plugin: >>>>>>>>>>>> https://github.com/projectnessie/gradle-build-plugins/blob/main/spotless/src/main/kotlin/org/projectnessie/buildtools/spotless/SpotlessHelperPlugin.kt >>>>>>>>>>>> For >>>>>>>>>>>> Iceberg, it would probably be nice to have some Groovy code >>>>>>>>>>>> formatting for >>>>>>>>>>>> the build scripts as well. >>>>>>>>>>>> >>>>>>>>>>>> Sure, the migration will add some pain. IMHO the best option is >>>>>>>>>>>> a „big bang“ across the whole code base, because it happens only >>>>>>>>>>>> once. >>>>>>>>>>>> Migrating one module after another is a „repeated series of pains“. >>>>>>>>>>>> >>>>>>>>>>>> Since the result of a `./gradlew spotlessApply` is >>>>>>>>>>>> deterministic, people that have open PRs could: >>>>>>>>>>>> 1. Rebase their PR branch against the commit before the „Big >>>>>>>>>>>> Bang“ >>>>>>>>>>>> 2. Include a commit with the necessary Gradle build change (one >>>>>>>>>>>> the only contains the changes to add Spotless) >>>>>>>>>>>> 3. Do the `./gradlew spotlessApply` >>>>>>>>>>>> 4. Squash all commits in the PR-branch >>>>>>>>>>>> 5. Rebase again - against the HEAD of the master branch >>>>>>>>>>>> 6. Force-push PR-branch >>>>>>>>>>>> Because git is „clever enough“ to eliminate the >>>>>>>>>>>> „duplicated/unrelated changes“, the final result of the above >>>>>>>>>>>> steps is just >>>>>>>>>>>> the diff with the changes for the open PR. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Am 08.07.2022 um 00:59 schrieb Ryan Blue <b...@tabular.io>: >>>>>>>>>>>> >>>>>>>>>>>> We were just talking about this proposal internally. I think it >>>>>>>>>>>> would be great to have automatic code formatting, especially since >>>>>>>>>>>> we have >>>>>>>>>>>> to point out a lot of changes manually. The main question is how >>>>>>>>>>>> to get >>>>>>>>>>>> there without too much disruption. This came up in our discussions >>>>>>>>>>>> around >>>>>>>>>>>> the upcoming 1.0 release, since that may be a good opportunity to >>>>>>>>>>>> make all >>>>>>>>>>>> of the code changes. >>>>>>>>>>>> >>>>>>>>>>>> For background, the main concern about adding something like >>>>>>>>>>>> this is applying all of the changes needed to get the existing >>>>>>>>>>>> code to >>>>>>>>>>>> conform to the new style. That is really disruptive because it >>>>>>>>>>>> will cause >>>>>>>>>>>> all of the PRs to need to be rebased and makes it really difficult >>>>>>>>>>>> to >>>>>>>>>>>> cherry-pick changes from after the code formatting happens to >>>>>>>>>>>> branches that >>>>>>>>>>>> were created before code formatting. The 1.0 release makes a good >>>>>>>>>>>> opportunity because we are making other changes (removing >>>>>>>>>>>> deprecations) and >>>>>>>>>>>> will hopefully have people upgrading their branches to the new >>>>>>>>>>>> major >>>>>>>>>>>> version, rather than cherry picking. >>>>>>>>>>>> >>>>>>>>>>>> This is as good a time as any to add automatic code formatting, >>>>>>>>>>>> but it's up to the community: so should we refromat the project >>>>>>>>>>>> and apply >>>>>>>>>>>> spotless code formatting everywhere? I'm interested to hear >>>>>>>>>>>> opinions! >>>>>>>>>>>> >>>>>>>>>>>> Ryan >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Mar 10, 2022 at 3:00 AM Eduard Tudenhoefner < >>>>>>>>>>>> edu...@dremio.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hello everyone, >>>>>>>>>>>>> >>>>>>>>>>>>> I would like to get the discussion started around automatic >>>>>>>>>>>>> code formatting + enforcing and how we get there. >>>>>>>>>>>>> >>>>>>>>>>>>> Currently we use Checkstyle *check* to enforce formatting. >>>>>>>>>>>>> However, the problem with that is that you still have to manually >>>>>>>>>>>>> do the >>>>>>>>>>>>> actual formatting. >>>>>>>>>>>>> >>>>>>>>>>>>> What I would like to propose is the usage of *Spotless* ( >>>>>>>>>>>>> https://github.com/diffplug/spotless) for *checking* and >>>>>>>>>>>>> *enforcing* Java code style (it can also enforce code style >>>>>>>>>>>>> for Scala, Markdown, ... btw). Spotless is being used by many >>>>>>>>>>>>> projects ( >>>>>>>>>>>>> https://github.com/search?l=gradle&q=spotless&type=Code) and >>>>>>>>>>>>> comes essentially with two tasks: >>>>>>>>>>>>> * *spotlessCheck*: Checks that sourcecode satisfies >>>>>>>>>>>>> formatting steps >>>>>>>>>>>>> * *spotlessApply*: Applies code formatting steps to >>>>>>>>>>>>> sourcecode in-place >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> *Code format* >>>>>>>>>>>>> >>>>>>>>>>>>> The problem with code format is that there is no single format >>>>>>>>>>>>> that can satisfy the preferences of everybody. However, from my >>>>>>>>>>>>> experience, >>>>>>>>>>>>> once people start to use *any* code format that produces >>>>>>>>>>>>> consistent results across *Eclipse**/IntelliJ/cmd line*, >>>>>>>>>>>>> people stop worrying about code format details. >>>>>>>>>>>>> This is also one of the reasons why the creators of Go decided >>>>>>>>>>>>> to have a code formatter built-in ( >>>>>>>>>>>>> https://go.dev/doc/effective_go#formatting): >>>>>>>>>>>>> >>>>>>>>>>>>> *Formatting issues are the most contentious but the least >>>>>>>>>>>>>> consequential. People can adapt to different formatting styles >>>>>>>>>>>>>> but it's >>>>>>>>>>>>>> better if they don't have to, and less time is devoted to the >>>>>>>>>>>>>> topic if >>>>>>>>>>>>>> everyone adheres to the same style. The problem is how to >>>>>>>>>>>>>> approach this >>>>>>>>>>>>>> Utopia without a long prescriptive style guide.* >>>>>>>>>>>>>> *With Go we take an unusual approach and let the machine take >>>>>>>>>>>>>> care of most formatting issues. The gofmt program (also >>>>>>>>>>>>>> available as go >>>>>>>>>>>>>> fmt, which operates at the package level rather than source file >>>>>>>>>>>>>> level) >>>>>>>>>>>>>> reads a Go program and emits the source in a standard style of >>>>>>>>>>>>>> indentation >>>>>>>>>>>>>> and vertical alignment, retaining and if necessary reformatting >>>>>>>>>>>>>> comments.* >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I would like to propose using the Google Java Format with >>>>>>>>>>>>> Spotless. The reason for this format is essentially that this is a >>>>>>>>>>>>> widely-adopted code format that is designed specifically for code >>>>>>>>>>>>> reviews >>>>>>>>>>>>> (since we're spending more time reviewing code than writing it). >>>>>>>>>>>>> Additionally, it produces consistent formatting results across >>>>>>>>>>>>> *Eclipse**/IntelliJ/cmd line*, which I think is another very >>>>>>>>>>>>> important factor. >>>>>>>>>>>>> >>>>>>>>>>>>> Thus, our initial Gradle spotless configuration could look >>>>>>>>>>>>> similar to the above below: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> *pluginManager.withPlugin('com.diffplug.spotless') { >>>>>>>>>>>>> spotless { // don't run spotlessCheck during gradle check >>>>>>>>>>>>> task during >>>>>>>>>>>>> the transition phase enforceCheck = false java { >>>>>>>>>>>>> target >>>>>>>>>>>>> 'src/main/java/**/*.java', 'src/test/java/**/*.java', >>>>>>>>>>>>> 'src/jmh/java/**/*.java' googleJavaFormat() } }}* >>>>>>>>>>>>> >>>>>>>>>>>>> We don't have to use Google Java Format. Spotless also >>>>>>>>>>>>> supports formatting the code with other formats, but from previous >>>>>>>>>>>>> experience the Google Java Format seemed to be really the only >>>>>>>>>>>>> one to >>>>>>>>>>>>> produce consistent results across *Eclipse**/IntelliJ/cmd >>>>>>>>>>>>> line*. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> *How do we get to a point where the entire codebase is >>>>>>>>>>>>> properly formatted (and enforceCheck = false can be removed)?* >>>>>>>>>>>>> >>>>>>>>>>>>> Now this is a difficult question. Obviously we don't want to >>>>>>>>>>>>> have a single *format-everything* commit, as that would >>>>>>>>>>>>> affect lots of in-flight PRs. >>>>>>>>>>>>> >>>>>>>>>>>>> There would have to be some form of gradual formatting, for >>>>>>>>>>>>> example module by module. Spotless offers something called >>>>>>>>>>>>> Ratched ( >>>>>>>>>>>>> https://github.com/diffplug/spotless/tree/main/plugin-gradle#ratchet) >>>>>>>>>>>>> that allows to enforce code format gradually (but I'm not sure >>>>>>>>>>>>> this would >>>>>>>>>>>>> be a good thing either). >>>>>>>>>>>>> >>>>>>>>>>>>> How exactly we'd like to approach this transitioning phase >>>>>>>>>>>>> this is a completely separate discussion, but I feel like at >>>>>>>>>>>>> least we could >>>>>>>>>>>>> get the ball rolling so that we make it also easier for newcomers >>>>>>>>>>>>> to >>>>>>>>>>>>> contribute to the project, since it would be straightforward for >>>>>>>>>>>>> them to >>>>>>>>>>>>> make their PRs adhere to the code format and also save time >>>>>>>>>>>>> during PR >>>>>>>>>>>>> reviews. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Eduard >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Ryan Blue >>>>>>>>>>>> Tabular >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Ryan Blue >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ryan Blue >>>>>>>> Tabular >>>>>>>> >>>>>>> >>>>> >>>>> -- >>>>> Ryan Blue >>>>> Tabular >>>>> >>>> >> > >