Thanks for your honest feedback Anton, which I really appreciate. I agree with the points you brought up.
I don't prefer the Google Java Format either, but it is at least something that produces consistent results across tools. If you ask me, I would love to customize the code format and have it as close as possible to what the Iceberg project had. But one of the reasons why the format isn't customizable is because it is difficult to support such customizations across tools. The problem with code format is that there is no single format that can satisfy the preferences of everybody. However, from my experience, once people start to use *any* code format that produces consistent results across *Eclipse/IntelliJ/cmd line*, people stop worrying about code format details. This is also one of the reasons why the creators of Go decided to have a code formatter built-in (https://go.dev/doc/effective_go#formatting). In the long run I think the decision to use Spotless + code formatting helps, especially for new people that want to contribute and don't want to feel frustrated due to review feedback around code style. Eduard On Thu, Jul 28, 2022 at 6:15 AM Anton Okolnychyi <aokolnyc...@apple.com> wrote: > I am late to the discussion, but I would like to express my opinion. > > As someone who deeply cares about the code consistency and leaves a lot of > style-related nits, I am definitely happy to see efforts to simplify the > life of the reviewers and automate the formatting process. Explaining the > code style to new contributors consumes a lot of time and results in more > iterations on PRs. Automating this would be great! > > However, I am concerned regarding the changes merged in PR #5312. > > - Many changes that went in that PR were against the original code style > of the project that we, committers and contributors, tried to maintain for > all these years. I think we had a lot of well-structured and consistent > places, which attracted devs to Iceberg. > - I feel the code quality dropped as some places were written in the old > style and now look really weird. If we followed the new style originally, > maybe, it would look much better as we would structure the code differently. > - As a someone whose team maintains an internal fork that is very close to > master (I know we are not the only ones), I feel such a massive change will > be a challenge to cherry-pick. I am not convinced introducing automatic > formatting had to be so radical. > > The first two points concern me the most. Since I am late to the > discussion, it would not be fair for me to request changes. However, I > strongly suggest the community to reconsider the approach we took while we > haven’t merged many changes on top of PR #5312. > > To sum up, I definitely support automating the formatting to simplify > reviews and help new contributors but I would try to come with a way to > make it less radical and follow whatever we had all these years. We did > discuss some of the formatting guidelines a few years ago and it is all > different now. > > - Anton > > On Jul 27, 2022, at 11:40 AM, Steven Wu <stevenz...@gmail.com> wrote: > > Eduard, thank you for driving this effort! Great work! > > On Wed, Jul 27, 2022 at 11:11 AM Eduard Tudenhoefner <edu...@tabular.io> > wrote: > >> Hey everyone, >> >> Google Java Format + Spotless integration have been merged as part of >> https://github.com/apache/iceberg/pull/5266. >> >> The big bang reformatting (aka *spotlessApply*) will happen as part of >> https://github.com/apache/iceberg/pull/5312 and once this PR is merged, >> please perform the following steps if you have open PRs: >> >> 1. Rebase your PR branch against the commit before the „Big Bang“ >> 2. Run `*./gradlew spotlessApply*` >> 3. Squash all commits >> 4. Rebase against the latest *HEAD* of the master branch >> 5. Force-push your branch >> >> This should hopefully produce a minimum of conflicts on PRs. >> >> Additionally, make sure to install the *google-java-format* plugin for >> your IDE as outlined in https://github.com/apache/iceberg-docs/pull/125 >> so that you can format code inside the IDE as well. >> >> Eduard >> >> On Fri, Jul 15, 2022 at 12:45 PM Eduard Tudenhoefner <edu...@tabular.io> >> wrote: >> >>> That's correct Ryan. The Google style is what can be applied across >>> tools and Spotless is the tool that we use to check and *apply* that >>> style (and also auto-apply the license header). The *apply* part is the >>> important piece that Checkstyle for example is missing (as it's tedious to >>> manually apply what *checkstyle:check* complains about). >>> >>> Additionally, Spotless >>> <https://github.com/diffplug/spotless/tree/main/plugin-gradle> supports >>> a bunch of other languages, so it would even be possible to use it for >>> auto-formatting Gradle files (or Scala files using scalafmt) if we wanted >>> to, but for now I would probably focus on the Java codebase. >>> >>> Eduard >>> >>> On Thu, Jul 14, 2022 at 6:58 PM Ryan Blue <b...@tabular.io> wrote: >>> >>>> Okay, I understand. So it isn't that spotless is supported across >>>> tools, it is that google style is supported and we can use Spotless to both >>>> check and apply that style. Is that correct? >>>> >>>> On Thu, Jul 14, 2022 at 9:56 AM Eduard Tudenhoefner <edu...@tabular.io> >>>> wrote: >>>> >>>>> Yeah it's how Dmitri said, the Google style is most likely on purpose >>>>> not configurable, because otherwise it's getting more difficult to achieve >>>>> the same formatting results across IDEs & CMD line. >>>>> >>>>> >>>>> On Wed, Jul 13, 2022 at 7:39 PM Dmitri Bourlatchkov < >>>>> dmitri.bourlatch...@dremio.com> wrote: >>>>> >>>>>> Yeah, the Google style has very specific wrapping limits and javadoc >>>>>> format. It is not configurable, if we go with it. >>>>>> >>>>>> From my POV, using another style would be fine. However, with a >>>>>> custom style it can be hard and tricky to ensure that all IDEs enforce it >>>>>> the same way as spotless. >>>>>> >>>>>> The Google style has a plugin at least for IntelliJ (and likely for >>>>>> Eclipse too) so it will behave the same way in IDE and cmd. line build. >>>>>> >>>>>> On Wed, Jul 13, 2022 at 1:28 PM Ryan Blue <b...@tabular.io> wrote: >>>>>> >>>>>>> Is there any value in trying to get Spotless to match the current >>>>>>> format as closely as possible? It would be great to have fewer changes. >>>>>>> >>>>>>> For example, when I tested the apply, I see this: >>>>>>> >>>>>>> - /** >>>>>>> - * Returns an initialized {@link AliyunProperties} >>>>>>> - */ >>>>>>> + /** Returns an initialized {@link AliyunProperties} */ >>>>>>> >>>>>>> Changes like that don’t seem very valuable to me. Similarly, it >>>>>>> looks like a lot of text is wrapped at a different line length. If we >>>>>>> can >>>>>>> alter that, we could easily avoid a lot of changes. >>>>>>> >>>>>>> Ryan >>>>>>> >>>>>>> On Wed, Jul 13, 2022 at 7:41 AM Eduard Tudenhoefner < >>>>>>> edu...@tabular.io> wrote: >>>>>>> >>>>>>>> As a first step, I created >>>>>>>> https://github.com/apache/iceberg/pull/5266 which configures >>>>>>>> Spotless to use the Google Java Format and also apply the correct >>>>>>>> copyright >>>>>>>> header for Java files. >>>>>>>> >>>>>>>> Once this PR is merged, the next steps would be: >>>>>>>> >>>>>>>> - removing conflicting Checkstyle rules that are not in line >>>>>>>> with the Google format >>>>>>>> - formatting the entire code base via `*./gradlew spotlessApply* >>>>>>>> ` >>>>>>>> - setting `*enforceCheck*` to `*true*` in >>>>>>>> >>>>>>>> https://github.com/apache/iceberg/blob/80318d8cfbeb0d96d0afc27c84bc3dbddde35344/baseline.gradle#L48 >>>>>>>> so that validation fails if code isn't properly formatted >>>>>>>> - updating docs around the current Formatter usage and how to >>>>>>>> configure Eclipse/IntelliJ >>>>>>>> >>>>>>>> The first 3 steps should be done together as part of the big bang. >>>>>>>> >>>>>>>> Eduard >>>>>>>> >>>>>>>> On Mon, Jul 11, 2022 at 10:54 PM Ryan Blue <b...@apache.org> wrote: >>>>>>>> >>>>>>>>> Okay, it sounds like there's mostly agreement for going with >>>>>>>>> spotless. Let's try that out. We'll work on some changes to add >>>>>>>>> spotless so >>>>>>>>> that `spotlessApply` works. Then we can do the big bang migration >>>>>>>>> (which I >>>>>>>>> also agree is the best option) just before the 1.0. >>>>>>>>> >>>>>>>>> Thanks, everyone! >>>>>>>>> >>>>>>>>> On Mon, Jul 11, 2022 at 11:50 AM Dmitri Bourlatchkov < >>>>>>>>> dmitri.bourlatch...@dremio.com> wrote: >>>>>>>>> >>>>>>>>>> My experience with the Google Code Style + Spotless was positive >>>>>>>>>> too. >>>>>>>>>> >>>>>>>>>> I'd be fine with another code style as long as it is >>>>>>>>>> "deterministic" (e.g. does not make changes on repeated execution) >>>>>>>>>> and >>>>>>>>>> works in IntelliJ IDEA / Eclipse / etc. >>>>>>>>>> >>>>>>>>>> Regarding cherry-picking into older branches, I think Robert's >>>>>>>>>> suggestion can be tweaked slightly to be helpful there too: >>>>>>>>>> >>>>>>>>>> 1. Checkout old branch >>>>>>>>>> 2. Apply the new style (run gradle ...) >>>>>>>>>> 3. Cherry-pick without committing >>>>>>>>>> 4. Manually revert to old style >>>>>>>>>> 5. Commit >>>>>>>>>> 6. Reset to original branch HEAD >>>>>>>>>> 7. Cherry pick commit 5 again >>>>>>>>>> >>>>>>>>>> It's a bit lengthy and may be a tedious process, but it should >>>>>>>>>> allow applying the git-level changes mostly automatically. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Dmitri. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Jul 8, 2022 at 2:53 AM Robert Stupp <sn...@snazy.de> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> From my experience, it’s a big win to have automatic code >>>>>>>>>>> formatting. >>>>>>>>>>> >>>>>>>>>>> In projectnessie we use automatic code formatting for all >>>>>>>>>>> languages and haven’t serious issues with Spotless. It is just nice >>>>>>>>>>> to not >>>>>>>>>>> have to bike shed about whitespaces, line breaks, brackets, etc. It >>>>>>>>>>> was a >>>>>>>>>>> bit of discussion, because people had bad memories from past >>>>>>>>>>> experiences >>>>>>>>>>> with automatic code formatting breaking code and introducing subtle >>>>>>>>>>> bugs. >>>>>>>>>>> >>>>>>>>>>> I think that using code styles that „do not allow bike shedding“ >>>>>>>>>>> (Google Code Style) are a very good option. >>>>>>>>>>> >>>>>>>>>>> So far none of us has seen issues with any of the Spotless code >>>>>>>>>>> formatters that we use: XML, Kotlin/Gradle, Kotlin, Antlr4, Java, >>>>>>>>>>> Scala - >>>>>>>>>>> relying on the „standard“ settings w/o any customizations. We use >>>>>>>>>>> this >>>>>>>>>>> piece of code, externalized into an internal Gradle plugin: >>>>>>>>>>> https://github.com/projectnessie/gradle-build-plugins/blob/main/spotless/src/main/kotlin/org/projectnessie/buildtools/spotless/SpotlessHelperPlugin.kt >>>>>>>>>>> For >>>>>>>>>>> Iceberg, it would probably be nice to have some Groovy code >>>>>>>>>>> formatting for >>>>>>>>>>> the build scripts as well. >>>>>>>>>>> >>>>>>>>>>> Sure, the migration will add some pain. IMHO the best option is >>>>>>>>>>> a „big bang“ across the whole code base, because it happens only >>>>>>>>>>> once. >>>>>>>>>>> Migrating one module after another is a „repeated series of pains“. >>>>>>>>>>> >>>>>>>>>>> Since the result of a `./gradlew spotlessApply` is >>>>>>>>>>> deterministic, people that have open PRs could: >>>>>>>>>>> 1. Rebase their PR branch against the commit before the „Big >>>>>>>>>>> Bang“ >>>>>>>>>>> 2. Include a commit with the necessary Gradle build change (one >>>>>>>>>>> the only contains the changes to add Spotless) >>>>>>>>>>> 3. Do the `./gradlew spotlessApply` >>>>>>>>>>> 4. Squash all commits in the PR-branch >>>>>>>>>>> 5. Rebase again - against the HEAD of the master branch >>>>>>>>>>> 6. Force-push PR-branch >>>>>>>>>>> Because git is „clever enough“ to eliminate the >>>>>>>>>>> „duplicated/unrelated changes“, the final result of the above steps >>>>>>>>>>> is just >>>>>>>>>>> the diff with the changes for the open PR. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Am 08.07.2022 um 00:59 schrieb Ryan Blue <b...@tabular.io>: >>>>>>>>>>> >>>>>>>>>>> We were just talking about this proposal internally. I think it >>>>>>>>>>> would be great to have automatic code formatting, especially since >>>>>>>>>>> we have >>>>>>>>>>> to point out a lot of changes manually. The main question is how to >>>>>>>>>>> get >>>>>>>>>>> there without too much disruption. This came up in our discussions >>>>>>>>>>> around >>>>>>>>>>> the upcoming 1.0 release, since that may be a good opportunity to >>>>>>>>>>> make all >>>>>>>>>>> of the code changes. >>>>>>>>>>> >>>>>>>>>>> For background, the main concern about adding something like >>>>>>>>>>> this is applying all of the changes needed to get the existing code >>>>>>>>>>> to >>>>>>>>>>> conform to the new style. That is really disruptive because it will >>>>>>>>>>> cause >>>>>>>>>>> all of the PRs to need to be rebased and makes it really difficult >>>>>>>>>>> to >>>>>>>>>>> cherry-pick changes from after the code formatting happens to >>>>>>>>>>> branches that >>>>>>>>>>> were created before code formatting. The 1.0 release makes a good >>>>>>>>>>> opportunity because we are making other changes (removing >>>>>>>>>>> deprecations) and >>>>>>>>>>> will hopefully have people upgrading their branches to the new major >>>>>>>>>>> version, rather than cherry picking. >>>>>>>>>>> >>>>>>>>>>> This is as good a time as any to add automatic code formatting, >>>>>>>>>>> but it's up to the community: so should we refromat the project and >>>>>>>>>>> apply >>>>>>>>>>> spotless code formatting everywhere? I'm interested to hear >>>>>>>>>>> opinions! >>>>>>>>>>> >>>>>>>>>>> Ryan >>>>>>>>>>> >>>>>>>>>>> On Thu, Mar 10, 2022 at 3:00 AM Eduard Tudenhoefner < >>>>>>>>>>> edu...@dremio.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hello everyone, >>>>>>>>>>>> >>>>>>>>>>>> I would like to get the discussion started around automatic >>>>>>>>>>>> code formatting + enforcing and how we get there. >>>>>>>>>>>> >>>>>>>>>>>> Currently we use Checkstyle *check* to enforce formatting. >>>>>>>>>>>> However, the problem with that is that you still have to manually >>>>>>>>>>>> do the >>>>>>>>>>>> actual formatting. >>>>>>>>>>>> >>>>>>>>>>>> What I would like to propose is the usage of *Spotless* ( >>>>>>>>>>>> https://github.com/diffplug/spotless) for *checking* and >>>>>>>>>>>> *enforcing* Java code style (it can also enforce code style >>>>>>>>>>>> for Scala, Markdown, ... btw). Spotless is being used by many >>>>>>>>>>>> projects ( >>>>>>>>>>>> https://github.com/search?l=gradle&q=spotless&type=Code) and >>>>>>>>>>>> comes essentially with two tasks: >>>>>>>>>>>> * *spotlessCheck*: Checks that sourcecode satisfies formatting >>>>>>>>>>>> steps >>>>>>>>>>>> * *spotlessApply*: Applies code formatting steps to sourcecode >>>>>>>>>>>> in-place >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> *Code format* >>>>>>>>>>>> >>>>>>>>>>>> The problem with code format is that there is no single format >>>>>>>>>>>> that can satisfy the preferences of everybody. However, from my >>>>>>>>>>>> experience, >>>>>>>>>>>> once people start to use *any* code format that produces >>>>>>>>>>>> consistent results across *Eclipse**/IntelliJ/cmd line*, >>>>>>>>>>>> people stop worrying about code format details. >>>>>>>>>>>> This is also one of the reasons why the creators of Go decided >>>>>>>>>>>> to have a code formatter built-in ( >>>>>>>>>>>> https://go.dev/doc/effective_go#formatting): >>>>>>>>>>>> >>>>>>>>>>>> *Formatting issues are the most contentious but the least >>>>>>>>>>>>> consequential. People can adapt to different formatting styles >>>>>>>>>>>>> but it's >>>>>>>>>>>>> better if they don't have to, and less time is devoted to the >>>>>>>>>>>>> topic if >>>>>>>>>>>>> everyone adheres to the same style. The problem is how to >>>>>>>>>>>>> approach this >>>>>>>>>>>>> Utopia without a long prescriptive style guide.* >>>>>>>>>>>>> *With Go we take an unusual approach and let the machine take >>>>>>>>>>>>> care of most formatting issues. The gofmt program (also available >>>>>>>>>>>>> as go >>>>>>>>>>>>> fmt, which operates at the package level rather than source file >>>>>>>>>>>>> level) >>>>>>>>>>>>> reads a Go program and emits the source in a standard style of >>>>>>>>>>>>> indentation >>>>>>>>>>>>> and vertical alignment, retaining and if necessary reformatting >>>>>>>>>>>>> comments.* >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I would like to propose using the Google Java Format with >>>>>>>>>>>> Spotless. The reason for this format is essentially that this is a >>>>>>>>>>>> widely-adopted code format that is designed specifically for code >>>>>>>>>>>> reviews >>>>>>>>>>>> (since we're spending more time reviewing code than writing it). >>>>>>>>>>>> Additionally, it produces consistent formatting results across >>>>>>>>>>>> *Eclipse**/IntelliJ/cmd line*, which I think is another very >>>>>>>>>>>> important factor. >>>>>>>>>>>> >>>>>>>>>>>> Thus, our initial Gradle spotless configuration could look >>>>>>>>>>>> similar to the above below: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> *pluginManager.withPlugin('com.diffplug.spotless') { >>>>>>>>>>>> spotless { // don't run spotlessCheck during gradle check >>>>>>>>>>>> task during >>>>>>>>>>>> the transition phase enforceCheck = false java { >>>>>>>>>>>> target >>>>>>>>>>>> 'src/main/java/**/*.java', 'src/test/java/**/*.java', >>>>>>>>>>>> 'src/jmh/java/**/*.java' googleJavaFormat() } }}* >>>>>>>>>>>> >>>>>>>>>>>> We don't have to use Google Java Format. Spotless also supports >>>>>>>>>>>> formatting the code with other formats, but from previous >>>>>>>>>>>> experience the >>>>>>>>>>>> Google Java Format seemed to be really the only one to produce >>>>>>>>>>>> consistent >>>>>>>>>>>> results across *Eclipse**/IntelliJ/cmd line*. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> *How do we get to a point where the entire codebase is properly >>>>>>>>>>>> formatted (and enforceCheck = false can be removed)?* >>>>>>>>>>>> >>>>>>>>>>>> Now this is a difficult question. Obviously we don't want to >>>>>>>>>>>> have a single *format-everything* commit, as that would affect >>>>>>>>>>>> lots of in-flight PRs. >>>>>>>>>>>> >>>>>>>>>>>> There would have to be some form of gradual formatting, for >>>>>>>>>>>> example module by module. Spotless offers something called Ratched >>>>>>>>>>>> ( >>>>>>>>>>>> https://github.com/diffplug/spotless/tree/main/plugin-gradle#ratchet) >>>>>>>>>>>> that allows to enforce code format gradually (but I'm not sure >>>>>>>>>>>> this would >>>>>>>>>>>> be a good thing either). >>>>>>>>>>>> >>>>>>>>>>>> How exactly we'd like to approach this transitioning phase this >>>>>>>>>>>> is a completely separate discussion, but I feel like at least we >>>>>>>>>>>> could get >>>>>>>>>>>> the ball rolling so that we make it also easier for newcomers to >>>>>>>>>>>> contribute >>>>>>>>>>>> to the project, since it would be straightforward for them to make >>>>>>>>>>>> their >>>>>>>>>>>> PRs adhere to the code format and also save time during PR reviews. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Eduard >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Ryan Blue >>>>>>>>>>> Tabular >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Ryan Blue >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ryan Blue >>>>>>> Tabular >>>>>>> >>>>>> >>>> >>>> -- >>>> Ryan Blue >>>> Tabular >>>> >>> >