As a first step, I created https://github.com/apache/iceberg/pull/5266 which
configures Spotless to use the Google Java Format and also apply the
correct copyright header for Java files.

Once this PR is merged, the next steps would be:

   - removing conflicting Checkstyle rules that are not in line with the
   Google format
   - formatting the entire code base via `*./gradlew spotlessApply*`
   - setting `*enforceCheck*` to `*true*` in
   
https://github.com/apache/iceberg/blob/80318d8cfbeb0d96d0afc27c84bc3dbddde35344/baseline.gradle#L48
   so that validation fails if code isn't properly formatted
   - updating docs around the current Formatter usage and how to configure
   Eclipse/IntelliJ

The first 3 steps should be done together as part of the big bang.

Eduard

On Mon, Jul 11, 2022 at 10:54 PM Ryan Blue <b...@apache.org> wrote:

> Okay, it sounds like there's mostly agreement for going with spotless.
> Let's try that out. We'll work on some changes to add spotless so that
> `spotlessApply` works. Then we can do the big bang migration (which I also
> agree is the best option) just before the 1.0.
>
> Thanks, everyone!
>
> On Mon, Jul 11, 2022 at 11:50 AM Dmitri Bourlatchkov <
> dmitri.bourlatch...@dremio.com> wrote:
>
>> My experience with the Google Code Style + Spotless was positive too.
>>
>> I'd be fine with another code style as long as it is "deterministic"
>> (e.g. does not make changes on repeated execution) and works in IntelliJ
>> IDEA / Eclipse / etc.
>>
>> Regarding cherry-picking into older branches, I think Robert's suggestion
>> can be tweaked slightly to be helpful there too:
>>
>> 1. Checkout old branch
>> 2. Apply the new style (run gradle ...)
>> 3. Cherry-pick without committing
>> 4. Manually revert to old style
>> 5. Commit
>> 6. Reset to original branch HEAD
>> 7. Cherry pick commit 5 again
>>
>> It's a bit lengthy and may be a tedious process, but it should allow
>> applying the git-level changes mostly automatically.
>>
>> Cheers,
>> Dmitri.
>>
>>
>> On Fri, Jul 8, 2022 at 2:53 AM Robert Stupp <sn...@snazy.de> wrote:
>>
>>> From my experience, it’s a big win to have automatic code formatting.
>>>
>>> In projectnessie we use automatic code formatting for all languages and
>>> haven’t serious issues with Spotless. It is just nice to not have to bike
>>> shed about whitespaces, line breaks, brackets, etc. It was a bit of
>>> discussion, because people had bad memories from past experiences with
>>> automatic code formatting breaking code and introducing subtle bugs.
>>>
>>> I think that using code styles that „do not allow bike shedding“ (Google
>>> Code Style) are a very good option.
>>>
>>> So far none of us has seen issues with any of the Spotless code
>>> formatters that we use: XML, Kotlin/Gradle, Kotlin, Antlr4, Java, Scala -
>>> relying on the „standard“ settings w/o any customizations. We use this
>>> piece of code, externalized into an internal Gradle plugin:
>>> https://github.com/projectnessie/gradle-build-plugins/blob/main/spotless/src/main/kotlin/org/projectnessie/buildtools/spotless/SpotlessHelperPlugin.kt
>>>  For
>>> Iceberg, it would probably be nice to have some Groovy code formatting for
>>> the build scripts as well.
>>>
>>> Sure, the migration will add some pain. IMHO the best option is a „big
>>> bang“ across the whole code base, because it happens only once. Migrating
>>> one module after another is a „repeated series of pains“.
>>>
>>> Since the result of a `./gradlew spotlessApply` is deterministic, people
>>> that have open PRs could:
>>> 1. Rebase their PR branch against the commit before the „Big Bang“
>>> 2. Include a commit with the necessary Gradle build change (one the only
>>> contains the changes to add Spotless)
>>> 3. Do the `./gradlew spotlessApply`
>>> 4. Squash all commits in the PR-branch
>>> 5. Rebase again - against the HEAD of the master branch
>>> 6. Force-push PR-branch
>>> Because git is „clever enough“ to eliminate the „duplicated/unrelated
>>> changes“, the final result of the above steps is just the diff with the
>>> changes for the open PR.
>>>
>>>
>>> Am 08.07.2022 um 00:59 schrieb Ryan Blue <b...@tabular.io>:
>>>
>>> We were just talking about this proposal internally. I think it would be
>>> great to have automatic code formatting, especially since we have to point
>>> out a lot of changes manually. The main question is how to get there
>>> without too much disruption. This came up in our discussions around the
>>> upcoming 1.0 release, since that may be a good opportunity to make all of
>>> the code changes.
>>>
>>> For background, the main concern about adding something like this is
>>> applying all of the changes needed to get the existing code to conform to
>>> the new style. That is really disruptive because it will cause all of the
>>> PRs to need to be rebased and makes it really difficult to cherry-pick
>>> changes from after the code formatting happens to branches that were
>>> created before code formatting. The 1.0 release makes a good opportunity
>>> because we are making other changes (removing deprecations) and will
>>> hopefully have people upgrading their branches to the new major version,
>>> rather than cherry picking.
>>>
>>> This is as good a time as any to add automatic code formatting, but it's
>>> up to the community: so should we refromat the project and apply spotless
>>> code formatting everywhere? I'm interested to hear opinions!
>>>
>>> Ryan
>>>
>>> On Thu, Mar 10, 2022 at 3:00 AM Eduard Tudenhoefner <edu...@dremio.com>
>>> wrote:
>>>
>>>> Hello everyone,
>>>>
>>>> I would like to get the discussion started around automatic code
>>>> formatting + enforcing and how we get there.
>>>>
>>>> Currently we use Checkstyle *check* to enforce formatting. However,
>>>> the problem with that is that you still have to manually do the actual
>>>> formatting.
>>>>
>>>> What I would like to propose is the usage of *Spotless* (
>>>> https://github.com/diffplug/spotless) for *checking* and *enforcing*
>>>> Java code style (it can also enforce code style for Scala, Markdown, ...
>>>> btw). Spotless is being used by many projects (
>>>> https://github.com/search?l=gradle&q=spotless&type=Code) and comes
>>>> essentially with two tasks:
>>>> * *spotlessCheck*: Checks that sourcecode satisfies formatting steps
>>>> * *spotlessApply*: Applies code formatting steps to sourcecode in-place
>>>>
>>>>
>>>> *Code format*
>>>>
>>>> The problem with code format is that there is no single format that can
>>>> satisfy the preferences of everybody. However, from my experience, once
>>>> people start to use *any* code format that produces consistent results
>>>> across *Eclipse**/IntelliJ/cmd line*, people stop worrying about code
>>>> format details.
>>>> This is also one of the reasons why the creators of Go decided to have
>>>> a code formatter built-in (https://go.dev/doc/effective_go#formatting):
>>>>
>>>> *Formatting issues are the most contentious but the least
>>>>> consequential. People can adapt to different formatting styles but it's
>>>>> better if they don't have to, and less time is devoted to the topic if
>>>>> everyone adheres to the same style. The problem is how to approach this
>>>>> Utopia without a long prescriptive style guide.*
>>>>> *With Go we take an unusual approach and let the machine take care of
>>>>> most formatting issues. The gofmt program (also available as go fmt, which
>>>>> operates at the package level rather than source file level) reads a Go
>>>>> program and emits the source in a standard style of indentation and
>>>>> vertical alignment, retaining and if necessary reformatting comments.*
>>>>
>>>>
>>>>
>>>> I would like to propose using the Google Java Format with Spotless. The
>>>> reason for this format is essentially that this is a widely-adopted code
>>>> format that is designed specifically for code reviews (since we're spending
>>>> more time reviewing code than writing it).
>>>> Additionally, it produces consistent formatting results across
>>>> *Eclipse**/IntelliJ/cmd line*, which I think is another very important
>>>> factor.
>>>>
>>>> Thus, our initial Gradle spotless configuration could look similar to
>>>> the above below:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *pluginManager.withPlugin('com.diffplug.spotless') {    spotless {
>>>> // don't run spotlessCheck during gradle check task during the transition
>>>> phase      enforceCheck = false      java {        target
>>>> 'src/main/java/**/*.java', 'src/test/java/**/*.java',
>>>> 'src/jmh/java/**/*.java'        googleJavaFormat()      }    }}*
>>>>
>>>> We don't have to use Google Java Format. Spotless also supports
>>>> formatting the code with other formats, but from previous experience the
>>>> Google Java Format seemed to be really the only one to produce consistent
>>>> results across *Eclipse**/IntelliJ/cmd line*.
>>>>
>>>>
>>>> *How do we get to a point where the entire codebase is properly
>>>> formatted (and enforceCheck = false can be removed)?*
>>>>
>>>> Now this is a difficult question. Obviously we don't want to have a
>>>> single *format-everything* commit, as that would affect lots of
>>>> in-flight PRs.
>>>>
>>>> There would have to be some form of gradual formatting, for example
>>>> module by module. Spotless offers something called Ratched (
>>>> https://github.com/diffplug/spotless/tree/main/plugin-gradle#ratchet)
>>>> that allows to enforce code format gradually (but I'm not sure this would
>>>> be a good thing either).
>>>>
>>>> How exactly we'd like to approach this transitioning phase this is a
>>>> completely separate discussion, but I feel like at least we could get the
>>>> ball rolling so that we make it also easier for newcomers to contribute to
>>>> the project, since it would be straightforward for them to make their PRs
>>>> adhere to the code format and also save time during PR reviews.
>>>>
>>>>
>>>> Eduard
>>>>
>>>
>>>
>>> --
>>> Ryan Blue
>>> Tabular
>>>
>>>
>>>
>
> --
> Ryan Blue
>

Reply via email to