As a first step, I created https://github.com/apache/iceberg/pull/5266 which configures Spotless to use the Google Java Format and also apply the correct copyright header for Java files.
Once this PR is merged, the next steps would be: - removing conflicting Checkstyle rules that are not in line with the Google format - formatting the entire code base via `*./gradlew spotlessApply*` - setting `*enforceCheck*` to `*true*` in https://github.com/apache/iceberg/blob/80318d8cfbeb0d96d0afc27c84bc3dbddde35344/baseline.gradle#L48 so that validation fails if code isn't properly formatted - updating docs around the current Formatter usage and how to configure Eclipse/IntelliJ The first 3 steps should be done together as part of the big bang. Eduard On Mon, Jul 11, 2022 at 10:54 PM Ryan Blue <b...@apache.org> wrote: > Okay, it sounds like there's mostly agreement for going with spotless. > Let's try that out. We'll work on some changes to add spotless so that > `spotlessApply` works. Then we can do the big bang migration (which I also > agree is the best option) just before the 1.0. > > Thanks, everyone! > > On Mon, Jul 11, 2022 at 11:50 AM Dmitri Bourlatchkov < > dmitri.bourlatch...@dremio.com> wrote: > >> My experience with the Google Code Style + Spotless was positive too. >> >> I'd be fine with another code style as long as it is "deterministic" >> (e.g. does not make changes on repeated execution) and works in IntelliJ >> IDEA / Eclipse / etc. >> >> Regarding cherry-picking into older branches, I think Robert's suggestion >> can be tweaked slightly to be helpful there too: >> >> 1. Checkout old branch >> 2. Apply the new style (run gradle ...) >> 3. Cherry-pick without committing >> 4. Manually revert to old style >> 5. Commit >> 6. Reset to original branch HEAD >> 7. Cherry pick commit 5 again >> >> It's a bit lengthy and may be a tedious process, but it should allow >> applying the git-level changes mostly automatically. >> >> Cheers, >> Dmitri. >> >> >> On Fri, Jul 8, 2022 at 2:53 AM Robert Stupp <sn...@snazy.de> wrote: >> >>> From my experience, it’s a big win to have automatic code formatting. >>> >>> In projectnessie we use automatic code formatting for all languages and >>> haven’t serious issues with Spotless. It is just nice to not have to bike >>> shed about whitespaces, line breaks, brackets, etc. It was a bit of >>> discussion, because people had bad memories from past experiences with >>> automatic code formatting breaking code and introducing subtle bugs. >>> >>> I think that using code styles that „do not allow bike shedding“ (Google >>> Code Style) are a very good option. >>> >>> So far none of us has seen issues with any of the Spotless code >>> formatters that we use: XML, Kotlin/Gradle, Kotlin, Antlr4, Java, Scala - >>> relying on the „standard“ settings w/o any customizations. We use this >>> piece of code, externalized into an internal Gradle plugin: >>> https://github.com/projectnessie/gradle-build-plugins/blob/main/spotless/src/main/kotlin/org/projectnessie/buildtools/spotless/SpotlessHelperPlugin.kt >>> For >>> Iceberg, it would probably be nice to have some Groovy code formatting for >>> the build scripts as well. >>> >>> Sure, the migration will add some pain. IMHO the best option is a „big >>> bang“ across the whole code base, because it happens only once. Migrating >>> one module after another is a „repeated series of pains“. >>> >>> Since the result of a `./gradlew spotlessApply` is deterministic, people >>> that have open PRs could: >>> 1. Rebase their PR branch against the commit before the „Big Bang“ >>> 2. Include a commit with the necessary Gradle build change (one the only >>> contains the changes to add Spotless) >>> 3. Do the `./gradlew spotlessApply` >>> 4. Squash all commits in the PR-branch >>> 5. Rebase again - against the HEAD of the master branch >>> 6. Force-push PR-branch >>> Because git is „clever enough“ to eliminate the „duplicated/unrelated >>> changes“, the final result of the above steps is just the diff with the >>> changes for the open PR. >>> >>> >>> Am 08.07.2022 um 00:59 schrieb Ryan Blue <b...@tabular.io>: >>> >>> We were just talking about this proposal internally. I think it would be >>> great to have automatic code formatting, especially since we have to point >>> out a lot of changes manually. The main question is how to get there >>> without too much disruption. This came up in our discussions around the >>> upcoming 1.0 release, since that may be a good opportunity to make all of >>> the code changes. >>> >>> For background, the main concern about adding something like this is >>> applying all of the changes needed to get the existing code to conform to >>> the new style. That is really disruptive because it will cause all of the >>> PRs to need to be rebased and makes it really difficult to cherry-pick >>> changes from after the code formatting happens to branches that were >>> created before code formatting. The 1.0 release makes a good opportunity >>> because we are making other changes (removing deprecations) and will >>> hopefully have people upgrading their branches to the new major version, >>> rather than cherry picking. >>> >>> This is as good a time as any to add automatic code formatting, but it's >>> up to the community: so should we refromat the project and apply spotless >>> code formatting everywhere? I'm interested to hear opinions! >>> >>> Ryan >>> >>> On Thu, Mar 10, 2022 at 3:00 AM Eduard Tudenhoefner <edu...@dremio.com> >>> wrote: >>> >>>> Hello everyone, >>>> >>>> I would like to get the discussion started around automatic code >>>> formatting + enforcing and how we get there. >>>> >>>> Currently we use Checkstyle *check* to enforce formatting. However, >>>> the problem with that is that you still have to manually do the actual >>>> formatting. >>>> >>>> What I would like to propose is the usage of *Spotless* ( >>>> https://github.com/diffplug/spotless) for *checking* and *enforcing* >>>> Java code style (it can also enforce code style for Scala, Markdown, ... >>>> btw). Spotless is being used by many projects ( >>>> https://github.com/search?l=gradle&q=spotless&type=Code) and comes >>>> essentially with two tasks: >>>> * *spotlessCheck*: Checks that sourcecode satisfies formatting steps >>>> * *spotlessApply*: Applies code formatting steps to sourcecode in-place >>>> >>>> >>>> *Code format* >>>> >>>> The problem with code format is that there is no single format that can >>>> satisfy the preferences of everybody. However, from my experience, once >>>> people start to use *any* code format that produces consistent results >>>> across *Eclipse**/IntelliJ/cmd line*, people stop worrying about code >>>> format details. >>>> This is also one of the reasons why the creators of Go decided to have >>>> a code formatter built-in (https://go.dev/doc/effective_go#formatting): >>>> >>>> *Formatting issues are the most contentious but the least >>>>> consequential. People can adapt to different formatting styles but it's >>>>> better if they don't have to, and less time is devoted to the topic if >>>>> everyone adheres to the same style. The problem is how to approach this >>>>> Utopia without a long prescriptive style guide.* >>>>> *With Go we take an unusual approach and let the machine take care of >>>>> most formatting issues. The gofmt program (also available as go fmt, which >>>>> operates at the package level rather than source file level) reads a Go >>>>> program and emits the source in a standard style of indentation and >>>>> vertical alignment, retaining and if necessary reformatting comments.* >>>> >>>> >>>> >>>> I would like to propose using the Google Java Format with Spotless. The >>>> reason for this format is essentially that this is a widely-adopted code >>>> format that is designed specifically for code reviews (since we're spending >>>> more time reviewing code than writing it). >>>> Additionally, it produces consistent formatting results across >>>> *Eclipse**/IntelliJ/cmd line*, which I think is another very important >>>> factor. >>>> >>>> Thus, our initial Gradle spotless configuration could look similar to >>>> the above below: >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *pluginManager.withPlugin('com.diffplug.spotless') { spotless { >>>> // don't run spotlessCheck during gradle check task during the transition >>>> phase enforceCheck = false java { target >>>> 'src/main/java/**/*.java', 'src/test/java/**/*.java', >>>> 'src/jmh/java/**/*.java' googleJavaFormat() } }}* >>>> >>>> We don't have to use Google Java Format. Spotless also supports >>>> formatting the code with other formats, but from previous experience the >>>> Google Java Format seemed to be really the only one to produce consistent >>>> results across *Eclipse**/IntelliJ/cmd line*. >>>> >>>> >>>> *How do we get to a point where the entire codebase is properly >>>> formatted (and enforceCheck = false can be removed)?* >>>> >>>> Now this is a difficult question. Obviously we don't want to have a >>>> single *format-everything* commit, as that would affect lots of >>>> in-flight PRs. >>>> >>>> There would have to be some form of gradual formatting, for example >>>> module by module. Spotless offers something called Ratched ( >>>> https://github.com/diffplug/spotless/tree/main/plugin-gradle#ratchet) >>>> that allows to enforce code format gradually (but I'm not sure this would >>>> be a good thing either). >>>> >>>> How exactly we'd like to approach this transitioning phase this is a >>>> completely separate discussion, but I feel like at least we could get the >>>> ball rolling so that we make it also easier for newcomers to contribute to >>>> the project, since it would be straightforward for them to make their PRs >>>> adhere to the code format and also save time during PR reviews. >>>> >>>> >>>> Eduard >>>> >>> >>> >>> -- >>> Ryan Blue >>> Tabular >>> >>> >>> > > -- > Ryan Blue >