Eduard, thank you for driving this effort! Great work!

On Wed, Jul 27, 2022 at 11:11 AM Eduard Tudenhoefner <edu...@tabular.io>
wrote:

> Hey everyone,
>
> Google Java Format + Spotless integration have been merged as part of
> https://github.com/apache/iceberg/pull/5266.
>
> The big bang reformatting (aka *spotlessApply*) will happen as part of
> https://github.com/apache/iceberg/pull/5312 and once this PR is merged,
> please perform the following steps if you have open PRs:
>
> 1. Rebase your PR branch against the commit before the „Big Bang“
> 2. Run `*./gradlew spotlessApply*`
> 3. Squash all commits
> 4. Rebase against the latest *HEAD* of the master branch
> 5. Force-push your branch
>
> This should hopefully produce a minimum of conflicts on PRs.
>
> Additionally, make sure to install the *google-java-format* plugin for
> your IDE as outlined in https://github.com/apache/iceberg-docs/pull/125
> so that you can format code inside the IDE as well.
>
> Eduard
>
> On Fri, Jul 15, 2022 at 12:45 PM Eduard Tudenhoefner <edu...@tabular.io>
> wrote:
>
>> That's correct Ryan. The Google style is what can be applied across tools
>> and Spotless is the tool that we use to check and *apply* that style
>> (and also auto-apply the license header). The *apply* part is the
>> important piece that Checkstyle for example is missing (as it's tedious to
>> manually apply what *checkstyle:check* complains about).
>>
>> Additionally, Spotless
>> <https://github.com/diffplug/spotless/tree/main/plugin-gradle> supports
>> a bunch of other languages, so it would even be possible to use it for
>> auto-formatting Gradle files (or Scala files using scalafmt) if we wanted
>> to, but for now I would probably focus on the Java codebase.
>>
>> Eduard
>>
>> On Thu, Jul 14, 2022 at 6:58 PM Ryan Blue <b...@tabular.io> wrote:
>>
>>> Okay, I understand. So it isn't that spotless is supported across tools,
>>> it is that google style is supported and we can use Spotless to both check
>>> and apply that style. Is that correct?
>>>
>>> On Thu, Jul 14, 2022 at 9:56 AM Eduard Tudenhoefner <edu...@tabular.io>
>>> wrote:
>>>
>>>> Yeah it's how Dmitri said, the Google style is most likely on purpose
>>>> not configurable, because otherwise it's getting more difficult to achieve
>>>> the same formatting results across IDEs & CMD line.
>>>>
>>>>
>>>> On Wed, Jul 13, 2022 at 7:39 PM Dmitri Bourlatchkov <
>>>> dmitri.bourlatch...@dremio.com> wrote:
>>>>
>>>>> Yeah, the Google style has very specific wrapping limits and javadoc
>>>>> format. It is not configurable, if we go with it.
>>>>>
>>>>> From my POV, using another style would be fine. However, with a custom
>>>>> style it can be hard and tricky to ensure that all IDEs enforce it the 
>>>>> same
>>>>> way as spotless.
>>>>>
>>>>> The Google style has a plugin at least for IntelliJ (and likely for
>>>>> Eclipse too) so it will behave the same way in IDE and cmd. line build.
>>>>>
>>>>> On Wed, Jul 13, 2022 at 1:28 PM Ryan Blue <b...@tabular.io> wrote:
>>>>>
>>>>>> Is there any value in trying to get Spotless to match the current
>>>>>> format as closely as possible? It would be great to have fewer changes.
>>>>>>
>>>>>> For example, when I tested the apply, I see this:
>>>>>>
>>>>>> -  /**
>>>>>> -   * Returns an initialized {@link AliyunProperties}
>>>>>> -   */
>>>>>> +  /** Returns an initialized {@link AliyunProperties} */
>>>>>>
>>>>>> Changes like that don’t seem very valuable to me. Similarly, it looks
>>>>>> like a lot of text is wrapped at a different line length. If we can alter
>>>>>> that, we could easily avoid a lot of changes.
>>>>>>
>>>>>> Ryan
>>>>>>
>>>>>> On Wed, Jul 13, 2022 at 7:41 AM Eduard Tudenhoefner <
>>>>>> edu...@tabular.io> wrote:
>>>>>>
>>>>>>> As a first step, I created
>>>>>>> https://github.com/apache/iceberg/pull/5266 which configures
>>>>>>> Spotless to use the Google Java Format and also apply the correct 
>>>>>>> copyright
>>>>>>> header for Java files.
>>>>>>>
>>>>>>> Once this PR is merged, the next steps would be:
>>>>>>>
>>>>>>>    - removing conflicting Checkstyle rules that are not in line
>>>>>>>    with the Google format
>>>>>>>    - formatting the entire code base via `*./gradlew spotlessApply*`
>>>>>>>    - setting `*enforceCheck*` to `*true*` in
>>>>>>>    
>>>>>>> https://github.com/apache/iceberg/blob/80318d8cfbeb0d96d0afc27c84bc3dbddde35344/baseline.gradle#L48
>>>>>>>    so that validation fails if code isn't properly formatted
>>>>>>>    - updating docs around the current Formatter usage and how to
>>>>>>>    configure Eclipse/IntelliJ
>>>>>>>
>>>>>>> The first 3 steps should be done together as part of the big bang.
>>>>>>>
>>>>>>> Eduard
>>>>>>>
>>>>>>> On Mon, Jul 11, 2022 at 10:54 PM Ryan Blue <b...@apache.org> wrote:
>>>>>>>
>>>>>>>> Okay, it sounds like there's mostly agreement for going with
>>>>>>>> spotless. Let's try that out. We'll work on some changes to add 
>>>>>>>> spotless so
>>>>>>>> that `spotlessApply` works. Then we can do the big bang migration 
>>>>>>>> (which I
>>>>>>>> also agree is the best option) just before the 1.0.
>>>>>>>>
>>>>>>>> Thanks, everyone!
>>>>>>>>
>>>>>>>> On Mon, Jul 11, 2022 at 11:50 AM Dmitri Bourlatchkov <
>>>>>>>> dmitri.bourlatch...@dremio.com> wrote:
>>>>>>>>
>>>>>>>>> My experience with the Google Code Style + Spotless was positive
>>>>>>>>> too.
>>>>>>>>>
>>>>>>>>> I'd be fine with another code style as long as it is
>>>>>>>>> "deterministic" (e.g. does not make changes on repeated execution) and
>>>>>>>>> works in IntelliJ IDEA / Eclipse / etc.
>>>>>>>>>
>>>>>>>>> Regarding cherry-picking into older branches, I think Robert's
>>>>>>>>> suggestion can be tweaked slightly to be helpful there too:
>>>>>>>>>
>>>>>>>>> 1. Checkout old branch
>>>>>>>>> 2. Apply the new style (run gradle ...)
>>>>>>>>> 3. Cherry-pick without committing
>>>>>>>>> 4. Manually revert to old style
>>>>>>>>> 5. Commit
>>>>>>>>> 6. Reset to original branch HEAD
>>>>>>>>> 7. Cherry pick commit 5 again
>>>>>>>>>
>>>>>>>>> It's a bit lengthy and may be a tedious process, but it should
>>>>>>>>> allow applying the git-level changes mostly automatically.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Dmitri.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jul 8, 2022 at 2:53 AM Robert Stupp <sn...@snazy.de>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> From my experience, it’s a big win to have automatic code
>>>>>>>>>> formatting.
>>>>>>>>>>
>>>>>>>>>> In projectnessie we use automatic code formatting for all
>>>>>>>>>> languages and haven’t serious issues with Spotless. It is just nice 
>>>>>>>>>> to not
>>>>>>>>>> have to bike shed about whitespaces, line breaks, brackets, etc. It 
>>>>>>>>>> was a
>>>>>>>>>> bit of discussion, because people had bad memories from past 
>>>>>>>>>> experiences
>>>>>>>>>> with automatic code formatting breaking code and introducing subtle 
>>>>>>>>>> bugs.
>>>>>>>>>>
>>>>>>>>>> I think that using code styles that „do not allow bike shedding“
>>>>>>>>>> (Google Code Style) are a very good option.
>>>>>>>>>>
>>>>>>>>>> So far none of us has seen issues with any of the Spotless code
>>>>>>>>>> formatters that we use: XML, Kotlin/Gradle, Kotlin, Antlr4, Java, 
>>>>>>>>>> Scala -
>>>>>>>>>> relying on the „standard“ settings w/o any customizations. We use 
>>>>>>>>>> this
>>>>>>>>>> piece of code, externalized into an internal Gradle plugin:
>>>>>>>>>> https://github.com/projectnessie/gradle-build-plugins/blob/main/spotless/src/main/kotlin/org/projectnessie/buildtools/spotless/SpotlessHelperPlugin.kt
>>>>>>>>>>  For
>>>>>>>>>> Iceberg, it would probably be nice to have some Groovy code 
>>>>>>>>>> formatting for
>>>>>>>>>> the build scripts as well.
>>>>>>>>>>
>>>>>>>>>> Sure, the migration will add some pain. IMHO the best option is a
>>>>>>>>>> „big bang“ across the whole code base, because it happens only once.
>>>>>>>>>> Migrating one module after another is a „repeated series of pains“.
>>>>>>>>>>
>>>>>>>>>> Since the result of a `./gradlew spotlessApply` is deterministic,
>>>>>>>>>> people that have open PRs could:
>>>>>>>>>> 1. Rebase their PR branch against the commit before the „Big
>>>>>>>>>> Bang“
>>>>>>>>>> 2. Include a commit with the necessary Gradle build change (one
>>>>>>>>>> the only contains the changes to add Spotless)
>>>>>>>>>> 3. Do the `./gradlew spotlessApply`
>>>>>>>>>> 4. Squash all commits in the PR-branch
>>>>>>>>>> 5. Rebase again - against the HEAD of the master branch
>>>>>>>>>> 6. Force-push PR-branch
>>>>>>>>>> Because git is „clever enough“ to eliminate the
>>>>>>>>>> „duplicated/unrelated changes“, the final result of the above steps 
>>>>>>>>>> is just
>>>>>>>>>> the diff with the changes for the open PR.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Am 08.07.2022 um 00:59 schrieb Ryan Blue <b...@tabular.io>:
>>>>>>>>>>
>>>>>>>>>> We were just talking about this proposal internally. I think it
>>>>>>>>>> would be great to have automatic code formatting, especially since 
>>>>>>>>>> we have
>>>>>>>>>> to point out a lot of changes manually. The main question is how to 
>>>>>>>>>> get
>>>>>>>>>> there without too much disruption. This came up in our discussions 
>>>>>>>>>> around
>>>>>>>>>> the upcoming 1.0 release, since that may be a good opportunity to 
>>>>>>>>>> make all
>>>>>>>>>> of the code changes.
>>>>>>>>>>
>>>>>>>>>> For background, the main concern about adding something like this
>>>>>>>>>> is applying all of the changes needed to get the existing code to 
>>>>>>>>>> conform
>>>>>>>>>> to the new style. That is really disruptive because it will cause 
>>>>>>>>>> all of
>>>>>>>>>> the PRs to need to be rebased and makes it really difficult to 
>>>>>>>>>> cherry-pick
>>>>>>>>>> changes from after the code formatting happens to branches that were
>>>>>>>>>> created before code formatting. The 1.0 release makes a good 
>>>>>>>>>> opportunity
>>>>>>>>>> because we are making other changes (removing deprecations) and will
>>>>>>>>>> hopefully have people upgrading their branches to the new major 
>>>>>>>>>> version,
>>>>>>>>>> rather than cherry picking.
>>>>>>>>>>
>>>>>>>>>> This is as good a time as any to add automatic code formatting,
>>>>>>>>>> but it's up to the community: so should we refromat the project and 
>>>>>>>>>> apply
>>>>>>>>>> spotless code formatting everywhere? I'm interested to hear opinions!
>>>>>>>>>>
>>>>>>>>>> Ryan
>>>>>>>>>>
>>>>>>>>>> On Thu, Mar 10, 2022 at 3:00 AM Eduard Tudenhoefner <
>>>>>>>>>> edu...@dremio.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>
>>>>>>>>>>> I would like to get the discussion started around automatic code
>>>>>>>>>>> formatting + enforcing and how we get there.
>>>>>>>>>>>
>>>>>>>>>>> Currently we use Checkstyle *check* to enforce formatting.
>>>>>>>>>>> However, the problem with that is that you still have to manually 
>>>>>>>>>>> do the
>>>>>>>>>>> actual formatting.
>>>>>>>>>>>
>>>>>>>>>>> What I would like to propose is the usage of *Spotless* (
>>>>>>>>>>> https://github.com/diffplug/spotless) for *checking* and
>>>>>>>>>>> *enforcing* Java code style (it can also enforce code style for
>>>>>>>>>>> Scala, Markdown, ... btw). Spotless is being used by many projects (
>>>>>>>>>>> https://github.com/search?l=gradle&q=spotless&type=Code) and
>>>>>>>>>>> comes essentially with two tasks:
>>>>>>>>>>> * *spotlessCheck*: Checks that sourcecode satisfies formatting
>>>>>>>>>>> steps
>>>>>>>>>>> * *spotlessApply*: Applies code formatting steps to sourcecode
>>>>>>>>>>> in-place
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *Code format*
>>>>>>>>>>>
>>>>>>>>>>> The problem with code format is that there is no single format
>>>>>>>>>>> that can satisfy the preferences of everybody. However, from my 
>>>>>>>>>>> experience,
>>>>>>>>>>> once people start to use *any* code format that produces
>>>>>>>>>>> consistent results across *Eclipse**/IntelliJ/cmd line*, people
>>>>>>>>>>> stop worrying about code format details.
>>>>>>>>>>> This is also one of the reasons why the creators of Go decided
>>>>>>>>>>> to have a code formatter built-in (
>>>>>>>>>>> https://go.dev/doc/effective_go#formatting):
>>>>>>>>>>>
>>>>>>>>>>> *Formatting issues are the most contentious but the least
>>>>>>>>>>>> consequential. People can adapt to different formatting styles but 
>>>>>>>>>>>> it's
>>>>>>>>>>>> better if they don't have to, and less time is devoted to the 
>>>>>>>>>>>> topic if
>>>>>>>>>>>> everyone adheres to the same style. The problem is how to approach 
>>>>>>>>>>>> this
>>>>>>>>>>>> Utopia without a long prescriptive style guide.*
>>>>>>>>>>>> *With Go we take an unusual approach and let the machine take
>>>>>>>>>>>> care of most formatting issues. The gofmt program (also available 
>>>>>>>>>>>> as go
>>>>>>>>>>>> fmt, which operates at the package level rather than source file 
>>>>>>>>>>>> level)
>>>>>>>>>>>> reads a Go program and emits the source in a standard style of 
>>>>>>>>>>>> indentation
>>>>>>>>>>>> and vertical alignment, retaining and if necessary reformatting 
>>>>>>>>>>>> comments.*
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I would like to propose using the Google Java Format with
>>>>>>>>>>> Spotless. The reason for this format is essentially that this is a
>>>>>>>>>>> widely-adopted code format that is designed specifically for code 
>>>>>>>>>>> reviews
>>>>>>>>>>> (since we're spending more time reviewing code than writing it).
>>>>>>>>>>> Additionally, it produces consistent formatting results across
>>>>>>>>>>> *Eclipse**/IntelliJ/cmd line*, which I think is another very
>>>>>>>>>>> important factor.
>>>>>>>>>>>
>>>>>>>>>>> Thus, our initial Gradle spotless configuration could look
>>>>>>>>>>> similar to the above below:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *pluginManager.withPlugin('com.diffplug.spotless') {    spotless
>>>>>>>>>>> {      // don't run spotlessCheck during gradle check task during 
>>>>>>>>>>> the
>>>>>>>>>>> transition phase      enforceCheck = false      java {        target
>>>>>>>>>>> 'src/main/java/**/*.java', 'src/test/java/**/*.java',
>>>>>>>>>>> 'src/jmh/java/**/*.java'        googleJavaFormat()      }    }}*
>>>>>>>>>>>
>>>>>>>>>>> We don't have to use Google Java Format. Spotless also supports
>>>>>>>>>>> formatting the code with other formats, but from previous 
>>>>>>>>>>> experience the
>>>>>>>>>>> Google Java Format seemed to be really the only one to produce 
>>>>>>>>>>> consistent
>>>>>>>>>>> results across *Eclipse**/IntelliJ/cmd line*.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *How do we get to a point where the entire codebase is properly
>>>>>>>>>>> formatted (and enforceCheck = false can be removed)?*
>>>>>>>>>>>
>>>>>>>>>>> Now this is a difficult question. Obviously we don't want to
>>>>>>>>>>> have a single *format-everything* commit, as that would affect
>>>>>>>>>>> lots of in-flight PRs.
>>>>>>>>>>>
>>>>>>>>>>> There would have to be some form of gradual formatting, for
>>>>>>>>>>> example module by module. Spotless offers something called Ratched (
>>>>>>>>>>> https://github.com/diffplug/spotless/tree/main/plugin-gradle#ratchet)
>>>>>>>>>>> that allows to enforce code format gradually (but I'm not sure this 
>>>>>>>>>>> would
>>>>>>>>>>> be a good thing either).
>>>>>>>>>>>
>>>>>>>>>>> How exactly we'd like to approach this transitioning phase this
>>>>>>>>>>> is a completely separate discussion, but I feel like at least we 
>>>>>>>>>>> could get
>>>>>>>>>>> the ball rolling so that we make it also easier for newcomers to 
>>>>>>>>>>> contribute
>>>>>>>>>>> to the project, since it would be straightforward for them to make 
>>>>>>>>>>> their
>>>>>>>>>>> PRs adhere to the code format and also save time during PR reviews.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Eduard
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Ryan Blue
>>>>>>>>>> Tabular
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Ryan Blue
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ryan Blue
>>>>>> Tabular
>>>>>>
>>>>>
>>>
>>> --
>>> Ryan Blue
>>> Tabular
>>>
>>

Reply via email to