Eduard, thank you for driving this effort! Great work! On Wed, Jul 27, 2022 at 11:11 AM Eduard Tudenhoefner <edu...@tabular.io> wrote:
> Hey everyone, > > Google Java Format + Spotless integration have been merged as part of > https://github.com/apache/iceberg/pull/5266. > > The big bang reformatting (aka *spotlessApply*) will happen as part of > https://github.com/apache/iceberg/pull/5312 and once this PR is merged, > please perform the following steps if you have open PRs: > > 1. Rebase your PR branch against the commit before the „Big Bang“ > 2. Run `*./gradlew spotlessApply*` > 3. Squash all commits > 4. Rebase against the latest *HEAD* of the master branch > 5. Force-push your branch > > This should hopefully produce a minimum of conflicts on PRs. > > Additionally, make sure to install the *google-java-format* plugin for > your IDE as outlined in https://github.com/apache/iceberg-docs/pull/125 > so that you can format code inside the IDE as well. > > Eduard > > On Fri, Jul 15, 2022 at 12:45 PM Eduard Tudenhoefner <edu...@tabular.io> > wrote: > >> That's correct Ryan. The Google style is what can be applied across tools >> and Spotless is the tool that we use to check and *apply* that style >> (and also auto-apply the license header). The *apply* part is the >> important piece that Checkstyle for example is missing (as it's tedious to >> manually apply what *checkstyle:check* complains about). >> >> Additionally, Spotless >> <https://github.com/diffplug/spotless/tree/main/plugin-gradle> supports >> a bunch of other languages, so it would even be possible to use it for >> auto-formatting Gradle files (or Scala files using scalafmt) if we wanted >> to, but for now I would probably focus on the Java codebase. >> >> Eduard >> >> On Thu, Jul 14, 2022 at 6:58 PM Ryan Blue <b...@tabular.io> wrote: >> >>> Okay, I understand. So it isn't that spotless is supported across tools, >>> it is that google style is supported and we can use Spotless to both check >>> and apply that style. Is that correct? >>> >>> On Thu, Jul 14, 2022 at 9:56 AM Eduard Tudenhoefner <edu...@tabular.io> >>> wrote: >>> >>>> Yeah it's how Dmitri said, the Google style is most likely on purpose >>>> not configurable, because otherwise it's getting more difficult to achieve >>>> the same formatting results across IDEs & CMD line. >>>> >>>> >>>> On Wed, Jul 13, 2022 at 7:39 PM Dmitri Bourlatchkov < >>>> dmitri.bourlatch...@dremio.com> wrote: >>>> >>>>> Yeah, the Google style has very specific wrapping limits and javadoc >>>>> format. It is not configurable, if we go with it. >>>>> >>>>> From my POV, using another style would be fine. However, with a custom >>>>> style it can be hard and tricky to ensure that all IDEs enforce it the >>>>> same >>>>> way as spotless. >>>>> >>>>> The Google style has a plugin at least for IntelliJ (and likely for >>>>> Eclipse too) so it will behave the same way in IDE and cmd. line build. >>>>> >>>>> On Wed, Jul 13, 2022 at 1:28 PM Ryan Blue <b...@tabular.io> wrote: >>>>> >>>>>> Is there any value in trying to get Spotless to match the current >>>>>> format as closely as possible? It would be great to have fewer changes. >>>>>> >>>>>> For example, when I tested the apply, I see this: >>>>>> >>>>>> - /** >>>>>> - * Returns an initialized {@link AliyunProperties} >>>>>> - */ >>>>>> + /** Returns an initialized {@link AliyunProperties} */ >>>>>> >>>>>> Changes like that don’t seem very valuable to me. Similarly, it looks >>>>>> like a lot of text is wrapped at a different line length. If we can alter >>>>>> that, we could easily avoid a lot of changes. >>>>>> >>>>>> Ryan >>>>>> >>>>>> On Wed, Jul 13, 2022 at 7:41 AM Eduard Tudenhoefner < >>>>>> edu...@tabular.io> wrote: >>>>>> >>>>>>> As a first step, I created >>>>>>> https://github.com/apache/iceberg/pull/5266 which configures >>>>>>> Spotless to use the Google Java Format and also apply the correct >>>>>>> copyright >>>>>>> header for Java files. >>>>>>> >>>>>>> Once this PR is merged, the next steps would be: >>>>>>> >>>>>>> - removing conflicting Checkstyle rules that are not in line >>>>>>> with the Google format >>>>>>> - formatting the entire code base via `*./gradlew spotlessApply*` >>>>>>> - setting `*enforceCheck*` to `*true*` in >>>>>>> >>>>>>> https://github.com/apache/iceberg/blob/80318d8cfbeb0d96d0afc27c84bc3dbddde35344/baseline.gradle#L48 >>>>>>> so that validation fails if code isn't properly formatted >>>>>>> - updating docs around the current Formatter usage and how to >>>>>>> configure Eclipse/IntelliJ >>>>>>> >>>>>>> The first 3 steps should be done together as part of the big bang. >>>>>>> >>>>>>> Eduard >>>>>>> >>>>>>> On Mon, Jul 11, 2022 at 10:54 PM Ryan Blue <b...@apache.org> wrote: >>>>>>> >>>>>>>> Okay, it sounds like there's mostly agreement for going with >>>>>>>> spotless. Let's try that out. We'll work on some changes to add >>>>>>>> spotless so >>>>>>>> that `spotlessApply` works. Then we can do the big bang migration >>>>>>>> (which I >>>>>>>> also agree is the best option) just before the 1.0. >>>>>>>> >>>>>>>> Thanks, everyone! >>>>>>>> >>>>>>>> On Mon, Jul 11, 2022 at 11:50 AM Dmitri Bourlatchkov < >>>>>>>> dmitri.bourlatch...@dremio.com> wrote: >>>>>>>> >>>>>>>>> My experience with the Google Code Style + Spotless was positive >>>>>>>>> too. >>>>>>>>> >>>>>>>>> I'd be fine with another code style as long as it is >>>>>>>>> "deterministic" (e.g. does not make changes on repeated execution) and >>>>>>>>> works in IntelliJ IDEA / Eclipse / etc. >>>>>>>>> >>>>>>>>> Regarding cherry-picking into older branches, I think Robert's >>>>>>>>> suggestion can be tweaked slightly to be helpful there too: >>>>>>>>> >>>>>>>>> 1. Checkout old branch >>>>>>>>> 2. Apply the new style (run gradle ...) >>>>>>>>> 3. Cherry-pick without committing >>>>>>>>> 4. Manually revert to old style >>>>>>>>> 5. Commit >>>>>>>>> 6. Reset to original branch HEAD >>>>>>>>> 7. Cherry pick commit 5 again >>>>>>>>> >>>>>>>>> It's a bit lengthy and may be a tedious process, but it should >>>>>>>>> allow applying the git-level changes mostly automatically. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Dmitri. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jul 8, 2022 at 2:53 AM Robert Stupp <sn...@snazy.de> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> From my experience, it’s a big win to have automatic code >>>>>>>>>> formatting. >>>>>>>>>> >>>>>>>>>> In projectnessie we use automatic code formatting for all >>>>>>>>>> languages and haven’t serious issues with Spotless. It is just nice >>>>>>>>>> to not >>>>>>>>>> have to bike shed about whitespaces, line breaks, brackets, etc. It >>>>>>>>>> was a >>>>>>>>>> bit of discussion, because people had bad memories from past >>>>>>>>>> experiences >>>>>>>>>> with automatic code formatting breaking code and introducing subtle >>>>>>>>>> bugs. >>>>>>>>>> >>>>>>>>>> I think that using code styles that „do not allow bike shedding“ >>>>>>>>>> (Google Code Style) are a very good option. >>>>>>>>>> >>>>>>>>>> So far none of us has seen issues with any of the Spotless code >>>>>>>>>> formatters that we use: XML, Kotlin/Gradle, Kotlin, Antlr4, Java, >>>>>>>>>> Scala - >>>>>>>>>> relying on the „standard“ settings w/o any customizations. We use >>>>>>>>>> this >>>>>>>>>> piece of code, externalized into an internal Gradle plugin: >>>>>>>>>> https://github.com/projectnessie/gradle-build-plugins/blob/main/spotless/src/main/kotlin/org/projectnessie/buildtools/spotless/SpotlessHelperPlugin.kt >>>>>>>>>> For >>>>>>>>>> Iceberg, it would probably be nice to have some Groovy code >>>>>>>>>> formatting for >>>>>>>>>> the build scripts as well. >>>>>>>>>> >>>>>>>>>> Sure, the migration will add some pain. IMHO the best option is a >>>>>>>>>> „big bang“ across the whole code base, because it happens only once. >>>>>>>>>> Migrating one module after another is a „repeated series of pains“. >>>>>>>>>> >>>>>>>>>> Since the result of a `./gradlew spotlessApply` is deterministic, >>>>>>>>>> people that have open PRs could: >>>>>>>>>> 1. Rebase their PR branch against the commit before the „Big >>>>>>>>>> Bang“ >>>>>>>>>> 2. Include a commit with the necessary Gradle build change (one >>>>>>>>>> the only contains the changes to add Spotless) >>>>>>>>>> 3. Do the `./gradlew spotlessApply` >>>>>>>>>> 4. Squash all commits in the PR-branch >>>>>>>>>> 5. Rebase again - against the HEAD of the master branch >>>>>>>>>> 6. Force-push PR-branch >>>>>>>>>> Because git is „clever enough“ to eliminate the >>>>>>>>>> „duplicated/unrelated changes“, the final result of the above steps >>>>>>>>>> is just >>>>>>>>>> the diff with the changes for the open PR. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Am 08.07.2022 um 00:59 schrieb Ryan Blue <b...@tabular.io>: >>>>>>>>>> >>>>>>>>>> We were just talking about this proposal internally. I think it >>>>>>>>>> would be great to have automatic code formatting, especially since >>>>>>>>>> we have >>>>>>>>>> to point out a lot of changes manually. The main question is how to >>>>>>>>>> get >>>>>>>>>> there without too much disruption. This came up in our discussions >>>>>>>>>> around >>>>>>>>>> the upcoming 1.0 release, since that may be a good opportunity to >>>>>>>>>> make all >>>>>>>>>> of the code changes. >>>>>>>>>> >>>>>>>>>> For background, the main concern about adding something like this >>>>>>>>>> is applying all of the changes needed to get the existing code to >>>>>>>>>> conform >>>>>>>>>> to the new style. That is really disruptive because it will cause >>>>>>>>>> all of >>>>>>>>>> the PRs to need to be rebased and makes it really difficult to >>>>>>>>>> cherry-pick >>>>>>>>>> changes from after the code formatting happens to branches that were >>>>>>>>>> created before code formatting. The 1.0 release makes a good >>>>>>>>>> opportunity >>>>>>>>>> because we are making other changes (removing deprecations) and will >>>>>>>>>> hopefully have people upgrading their branches to the new major >>>>>>>>>> version, >>>>>>>>>> rather than cherry picking. >>>>>>>>>> >>>>>>>>>> This is as good a time as any to add automatic code formatting, >>>>>>>>>> but it's up to the community: so should we refromat the project and >>>>>>>>>> apply >>>>>>>>>> spotless code formatting everywhere? I'm interested to hear opinions! >>>>>>>>>> >>>>>>>>>> Ryan >>>>>>>>>> >>>>>>>>>> On Thu, Mar 10, 2022 at 3:00 AM Eduard Tudenhoefner < >>>>>>>>>> edu...@dremio.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hello everyone, >>>>>>>>>>> >>>>>>>>>>> I would like to get the discussion started around automatic code >>>>>>>>>>> formatting + enforcing and how we get there. >>>>>>>>>>> >>>>>>>>>>> Currently we use Checkstyle *check* to enforce formatting. >>>>>>>>>>> However, the problem with that is that you still have to manually >>>>>>>>>>> do the >>>>>>>>>>> actual formatting. >>>>>>>>>>> >>>>>>>>>>> What I would like to propose is the usage of *Spotless* ( >>>>>>>>>>> https://github.com/diffplug/spotless) for *checking* and >>>>>>>>>>> *enforcing* Java code style (it can also enforce code style for >>>>>>>>>>> Scala, Markdown, ... btw). Spotless is being used by many projects ( >>>>>>>>>>> https://github.com/search?l=gradle&q=spotless&type=Code) and >>>>>>>>>>> comes essentially with two tasks: >>>>>>>>>>> * *spotlessCheck*: Checks that sourcecode satisfies formatting >>>>>>>>>>> steps >>>>>>>>>>> * *spotlessApply*: Applies code formatting steps to sourcecode >>>>>>>>>>> in-place >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> *Code format* >>>>>>>>>>> >>>>>>>>>>> The problem with code format is that there is no single format >>>>>>>>>>> that can satisfy the preferences of everybody. However, from my >>>>>>>>>>> experience, >>>>>>>>>>> once people start to use *any* code format that produces >>>>>>>>>>> consistent results across *Eclipse**/IntelliJ/cmd line*, people >>>>>>>>>>> stop worrying about code format details. >>>>>>>>>>> This is also one of the reasons why the creators of Go decided >>>>>>>>>>> to have a code formatter built-in ( >>>>>>>>>>> https://go.dev/doc/effective_go#formatting): >>>>>>>>>>> >>>>>>>>>>> *Formatting issues are the most contentious but the least >>>>>>>>>>>> consequential. People can adapt to different formatting styles but >>>>>>>>>>>> it's >>>>>>>>>>>> better if they don't have to, and less time is devoted to the >>>>>>>>>>>> topic if >>>>>>>>>>>> everyone adheres to the same style. The problem is how to approach >>>>>>>>>>>> this >>>>>>>>>>>> Utopia without a long prescriptive style guide.* >>>>>>>>>>>> *With Go we take an unusual approach and let the machine take >>>>>>>>>>>> care of most formatting issues. The gofmt program (also available >>>>>>>>>>>> as go >>>>>>>>>>>> fmt, which operates at the package level rather than source file >>>>>>>>>>>> level) >>>>>>>>>>>> reads a Go program and emits the source in a standard style of >>>>>>>>>>>> indentation >>>>>>>>>>>> and vertical alignment, retaining and if necessary reformatting >>>>>>>>>>>> comments.* >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I would like to propose using the Google Java Format with >>>>>>>>>>> Spotless. The reason for this format is essentially that this is a >>>>>>>>>>> widely-adopted code format that is designed specifically for code >>>>>>>>>>> reviews >>>>>>>>>>> (since we're spending more time reviewing code than writing it). >>>>>>>>>>> Additionally, it produces consistent formatting results across >>>>>>>>>>> *Eclipse**/IntelliJ/cmd line*, which I think is another very >>>>>>>>>>> important factor. >>>>>>>>>>> >>>>>>>>>>> Thus, our initial Gradle spotless configuration could look >>>>>>>>>>> similar to the above below: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> *pluginManager.withPlugin('com.diffplug.spotless') { spotless >>>>>>>>>>> { // don't run spotlessCheck during gradle check task during >>>>>>>>>>> the >>>>>>>>>>> transition phase enforceCheck = false java { target >>>>>>>>>>> 'src/main/java/**/*.java', 'src/test/java/**/*.java', >>>>>>>>>>> 'src/jmh/java/**/*.java' googleJavaFormat() } }}* >>>>>>>>>>> >>>>>>>>>>> We don't have to use Google Java Format. Spotless also supports >>>>>>>>>>> formatting the code with other formats, but from previous >>>>>>>>>>> experience the >>>>>>>>>>> Google Java Format seemed to be really the only one to produce >>>>>>>>>>> consistent >>>>>>>>>>> results across *Eclipse**/IntelliJ/cmd line*. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> *How do we get to a point where the entire codebase is properly >>>>>>>>>>> formatted (and enforceCheck = false can be removed)?* >>>>>>>>>>> >>>>>>>>>>> Now this is a difficult question. Obviously we don't want to >>>>>>>>>>> have a single *format-everything* commit, as that would affect >>>>>>>>>>> lots of in-flight PRs. >>>>>>>>>>> >>>>>>>>>>> There would have to be some form of gradual formatting, for >>>>>>>>>>> example module by module. Spotless offers something called Ratched ( >>>>>>>>>>> https://github.com/diffplug/spotless/tree/main/plugin-gradle#ratchet) >>>>>>>>>>> that allows to enforce code format gradually (but I'm not sure this >>>>>>>>>>> would >>>>>>>>>>> be a good thing either). >>>>>>>>>>> >>>>>>>>>>> How exactly we'd like to approach this transitioning phase this >>>>>>>>>>> is a completely separate discussion, but I feel like at least we >>>>>>>>>>> could get >>>>>>>>>>> the ball rolling so that we make it also easier for newcomers to >>>>>>>>>>> contribute >>>>>>>>>>> to the project, since it would be straightforward for them to make >>>>>>>>>>> their >>>>>>>>>>> PRs adhere to the code format and also save time during PR reviews. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Eduard >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Ryan Blue >>>>>>>>>> Tabular >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ryan Blue >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Ryan Blue >>>>>> Tabular >>>>>> >>>>> >>> >>> -- >>> Ryan Blue >>> Tabular >>> >>