On a side note, having automatic formatting was really nice. I think it is the right direction. If we only had a way to tweak a few rules like the max line length, one-line Javadoc and maybe consider using the Google format for 1.8 cause we are using the 1.7 format at the moment, which was not designed for lambdas. That way we may have all the benefits of automatic formatting with less disruption to the code.
- Anton > On Jul 28, 2022, at 7:35 PM, Anton Okolnychyi <aokolnyc...@apple.com.INVALID> > wrote: > > I had a chance to rebase one of my PRs. It is really hard to justify the new > max line length of 100 chars. I think the most unreadable code is usually > around places when single statements are split into multiple lines. Since > Java has explicit types and we prefer full and elaborate class/var names, the > new line limit alone triggered so many changes and requires splitting a lot > of statements that would previously fit on a single line. > > The max line length is one of the reasons why Palantir folks forked the > Google format. > > https://github.com/palantir/palantir-java-format > <https://github.com/palantir/palantir-java-format> > > - Anton > > >> On Jul 28, 2022, at 7:42 AM, Eduard Tudenhoefner <edu...@tabular.io >> <mailto:edu...@tabular.io>> wrote: >> >> Thanks for your honest feedback Anton, which I really appreciate. I agree >> with the points you brought up. >> >> I don't prefer the Google Java Format either, but it is at least something >> that produces consistent results across tools. >> If you ask me, I would love to customize the code format and have it as >> close as possible to what the Iceberg project had. But one of the reasons >> why the format isn't customizable is because it is difficult to support such >> customizations across tools. >> >> The problem with code format is that there is no single format that can >> satisfy the preferences of everybody. However, from my experience, once >> people start to use any code format that produces consistent results >> across Eclipse/IntelliJ/cmd line, people stop worrying about code format >> details. >> >> This is also one of the reasons why the creators of Go decided to have a >> code formatter built-in (https://go.dev/doc/effective_go#formatting >> <https://go.dev/doc/effective_go#formatting>). >> >> In the long run I think the decision to use Spotless + code formatting >> helps, especially for new people that want to contribute and don't want to >> feel frustrated due to review feedback around code style. >> >> Eduard >> >> On Thu, Jul 28, 2022 at 6:15 AM Anton Okolnychyi <aokolnyc...@apple.com >> <mailto:aokolnyc...@apple.com>> wrote: >> I am late to the discussion, but I would like to express my opinion. >> >> As someone who deeply cares about the code consistency and leaves a lot of >> style-related nits, I am definitely happy to see efforts to simplify the >> life of the reviewers and automate the formatting process. Explaining the >> code style to new contributors consumes a lot of time and results in more >> iterations on PRs. Automating this would be great! >> >> However, I am concerned regarding the changes merged in PR #5312. >> >> - Many changes that went in that PR were against the original code style of >> the project that we, committers and contributors, tried to maintain for all >> these years. I think we had a lot of well-structured and consistent places, >> which attracted devs to Iceberg. >> - I feel the code quality dropped as some places were written in the old >> style and now look really weird. If we followed the new style originally, >> maybe, it would look much better as we would structure the code differently. >> - As a someone whose team maintains an internal fork that is very close to >> master (I know we are not the only ones), I feel such a massive change will >> be a challenge to cherry-pick. I am not convinced introducing automatic >> formatting had to be so radical. >> >> The first two points concern me the most. Since I am late to the discussion, >> it would not be fair for me to request changes. However, I strongly suggest >> the community to reconsider the approach we took while we haven’t merged >> many changes on top of PR #5312. >> >> To sum up, I definitely support automating the formatting to simplify >> reviews and help new contributors but I would try to come with a way to make >> it less radical and follow whatever we had all these years. We did discuss >> some of the formatting guidelines a few years ago and it is all different >> now. >> >> - Anton >> >>> On Jul 27, 2022, at 11:40 AM, Steven Wu <stevenz...@gmail.com >>> <mailto:stevenz...@gmail.com>> wrote: >>> >>> Eduard, thank you for driving this effort! Great work! >>> >>> On Wed, Jul 27, 2022 at 11:11 AM Eduard Tudenhoefner <edu...@tabular.io >>> <mailto:edu...@tabular.io>> wrote: >>> Hey everyone, >>> >>> Google Java Format + Spotless integration have been merged as part of >>> https://github.com/apache/iceberg/pull/5266 >>> <https://github.com/apache/iceberg/pull/5266>. >>> >>> The big bang reformatting (aka spotlessApply) will happen as part of >>> https://github.com/apache/iceberg/pull/5312 >>> <https://github.com/apache/iceberg/pull/5312> and once this PR is merged, >>> please perform the following steps if you have open PRs: >>> >>> 1. Rebase your PR branch against the commit before the „Big Bang“ >>> 2. Run `./gradlew spotlessApply` >>> 3. Squash all commits >>> 4. Rebase against the latest HEAD of the master branch >>> 5. Force-push your branch >>> >>> This should hopefully produce a minimum of conflicts on PRs. >>> >>> Additionally, make sure to install the google-java-format plugin for your >>> IDE as outlined in https://github.com/apache/iceberg-docs/pull/125 >>> <https://github.com/apache/iceberg-docs/pull/125> so that you can format >>> code inside the IDE as well. >>> >>> Eduard >>> >>> On Fri, Jul 15, 2022 at 12:45 PM Eduard Tudenhoefner <edu...@tabular.io >>> <mailto:edu...@tabular.io>> wrote: >>> That's correct Ryan. The Google style is what can be applied across tools >>> and Spotless is the tool that we use to check and apply that style (and >>> also auto-apply the license header). The apply part is the important piece >>> that Checkstyle for example is missing (as it's tedious to manually apply >>> what checkstyle:check complains about). >>> >>> Additionally, Spotless >>> <https://github.com/diffplug/spotless/tree/main/plugin-gradle> supports a >>> bunch of other languages, so it would even be possible to use it for >>> auto-formatting Gradle files (or Scala files using scalafmt) if we wanted >>> to, but for now I would probably focus on the Java codebase. >>> >>> Eduard >>> >>> On Thu, Jul 14, 2022 at 6:58 PM Ryan Blue <b...@tabular.io >>> <mailto:b...@tabular.io>> wrote: >>> Okay, I understand. So it isn't that spotless is supported across tools, it >>> is that google style is supported and we can use Spotless to both check and >>> apply that style. Is that correct? >>> >>> On Thu, Jul 14, 2022 at 9:56 AM Eduard Tudenhoefner <edu...@tabular.io >>> <mailto:edu...@tabular.io>> wrote: >>> Yeah it's how Dmitri said, the Google style is most likely on purpose not >>> configurable, because otherwise it's getting more difficult to achieve the >>> same formatting results across IDEs & CMD line. >>> >>> >>> On Wed, Jul 13, 2022 at 7:39 PM Dmitri Bourlatchkov >>> <dmitri.bourlatch...@dremio.com <mailto:dmitri.bourlatch...@dremio.com>> >>> wrote: >>> Yeah, the Google style has very specific wrapping limits and javadoc >>> format. It is not configurable, if we go with it. >>> >>> From my POV, using another style would be fine. However, with a custom >>> style it can be hard and tricky to ensure that all IDEs enforce it the same >>> way as spotless. >>> >>> The Google style has a plugin at least for IntelliJ (and likely for Eclipse >>> too) so it will behave the same way in IDE and cmd. line build. >>> >>> On Wed, Jul 13, 2022 at 1:28 PM Ryan Blue <b...@tabular.io >>> <mailto:b...@tabular.io>> wrote: >>> Is there any value in trying to get Spotless to match the current format as >>> closely as possible? It would be great to have fewer changes. >>> >>> For example, when I tested the apply, I see this: >>> >>> - /** >>> - * Returns an initialized {@link AliyunProperties} >>> - */ >>> + /** Returns an initialized {@link AliyunProperties} */ >>> Changes like that don’t seem very valuable to me. Similarly, it looks like >>> a lot of text is wrapped at a different line length. If we can alter that, >>> we could easily avoid a lot of changes. >>> >>> Ryan >>> >>> >>> On Wed, Jul 13, 2022 at 7:41 AM Eduard Tudenhoefner <edu...@tabular.io >>> <mailto:edu...@tabular.io>> wrote: >>> As a first step, I created https://github.com/apache/iceberg/pull/5266 >>> <https://github.com/apache/iceberg/pull/5266> which configures Spotless to >>> use the Google Java Format and also apply the correct copyright header for >>> Java files. >>> >>> Once this PR is merged, the next steps would be: >>> removing conflicting Checkstyle rules that are not in line with the Google >>> format >>> formatting the entire code base via `./gradlew spotlessApply` >>> setting `enforceCheck` to `true` in >>> https://github.com/apache/iceberg/blob/80318d8cfbeb0d96d0afc27c84bc3dbddde35344/baseline.gradle#L48 >>> >>> <https://github.com/apache/iceberg/blob/80318d8cfbeb0d96d0afc27c84bc3dbddde35344/baseline.gradle#L48> >>> so that validation fails if code isn't properly formatted >>> updating docs around the current Formatter usage and how to configure >>> Eclipse/IntelliJ >>> The first 3 steps should be done together as part of the big bang. >>> >>> Eduard >>> >>> On Mon, Jul 11, 2022 at 10:54 PM Ryan Blue <b...@apache.org >>> <mailto:b...@apache.org>> wrote: >>> Okay, it sounds like there's mostly agreement for going with spotless. >>> Let's try that out. We'll work on some changes to add spotless so that >>> `spotlessApply` works. Then we can do the big bang migration (which I also >>> agree is the best option) just before the 1.0. >>> >>> Thanks, everyone! >>> >>> On Mon, Jul 11, 2022 at 11:50 AM Dmitri Bourlatchkov >>> <dmitri.bourlatch...@dremio.com <mailto:dmitri.bourlatch...@dremio.com>> >>> wrote: >>> My experience with the Google Code Style + Spotless was positive too. >>> >>> I'd be fine with another code style as long as it is "deterministic" (e.g. >>> does not make changes on repeated execution) and works in IntelliJ IDEA / >>> Eclipse / etc. >>> >>> Regarding cherry-picking into older branches, I think Robert's suggestion >>> can be tweaked slightly to be helpful there too: >>> >>> 1. Checkout old branch >>> 2. Apply the new style (run gradle ...) >>> 3. Cherry-pick without committing >>> 4. Manually revert to old style >>> 5. Commit >>> 6. Reset to original branch HEAD >>> 7. Cherry pick commit 5 again >>> >>> It's a bit lengthy and may be a tedious process, but it should allow >>> applying the git-level changes mostly automatically. >>> >>> Cheers, >>> Dmitri. >>> >>> >>> On Fri, Jul 8, 2022 at 2:53 AM Robert Stupp <sn...@snazy.de >>> <mailto:sn...@snazy.de>> wrote: >>> From my experience, it’s a big win to have automatic code formatting. >>> >>> In projectnessie we use automatic code formatting for all languages and >>> haven’t serious issues with Spotless. It is just nice to not have to bike >>> shed about whitespaces, line breaks, brackets, etc. It was a bit of >>> discussion, because people had bad memories from past experiences with >>> automatic code formatting breaking code and introducing subtle bugs. >>> >>> I think that using code styles that „do not allow bike shedding“ (Google >>> Code Style) are a very good option. >>> >>> So far none of us has seen issues with any of the Spotless code formatters >>> that we use: XML, Kotlin/Gradle, Kotlin, Antlr4, Java, Scala - relying on >>> the „standard“ settings w/o any customizations. We use this piece of code, >>> externalized into an internal Gradle plugin: >>> https://github.com/projectnessie/gradle-build-plugins/blob/main/spotless/src/main/kotlin/org/projectnessie/buildtools/spotless/SpotlessHelperPlugin.kt >>> >>> <https://github.com/projectnessie/gradle-build-plugins/blob/main/spotless/src/main/kotlin/org/projectnessie/buildtools/spotless/SpotlessHelperPlugin.kt> >>> For Iceberg, it would probably be nice to have some Groovy code formatting >>> for the build scripts as well. >>> >>> Sure, the migration will add some pain. IMHO the best option is a „big >>> bang“ across the whole code base, because it happens only once. Migrating >>> one module after another is a „repeated series of pains“. >>> >>> Since the result of a `./gradlew spotlessApply` is deterministic, people >>> that have open PRs could: >>> 1. Rebase their PR branch against the commit before the „Big Bang“ >>> 2. Include a commit with the necessary Gradle build change (one the only >>> contains the changes to add Spotless) >>> 3. Do the `./gradlew spotlessApply` >>> 4. Squash all commits in the PR-branch >>> 5. Rebase again - against the HEAD of the master branch >>> 6. Force-push PR-branch >>> Because git is „clever enough“ to eliminate the „duplicated/unrelated >>> changes“, the final result of the above steps is just the diff with the >>> changes for the open PR. >>> >>> >>>> Am 08.07.2022 um 00:59 schrieb Ryan Blue <b...@tabular.io >>>> <mailto:b...@tabular.io>>: >>>> >>>> We were just talking about this proposal internally. I think it would be >>>> great to have automatic code formatting, especially since we have to point >>>> out a lot of changes manually. The main question is how to get there >>>> without too much disruption. This came up in our discussions around the >>>> upcoming 1.0 release, since that may be a good opportunity to make all of >>>> the code changes. >>>> >>>> For background, the main concern about adding something like this is >>>> applying all of the changes needed to get the existing code to conform to >>>> the new style. That is really disruptive because it will cause all of the >>>> PRs to need to be rebased and makes it really difficult to cherry-pick >>>> changes from after the code formatting happens to branches that were >>>> created before code formatting. The 1.0 release makes a good opportunity >>>> because we are making other changes (removing deprecations) and will >>>> hopefully have people upgrading their branches to the new major version, >>>> rather than cherry picking. >>>> >>>> This is as good a time as any to add automatic code formatting, but it's >>>> up to the community: so should we refromat the project and apply spotless >>>> code formatting everywhere? I'm interested to hear opinions! >>>> >>>> Ryan >>>> >>>> On Thu, Mar 10, 2022 at 3:00 AM Eduard Tudenhoefner <edu...@dremio.com >>>> <mailto:edu...@dremio.com>> wrote: >>>> Hello everyone, >>>> >>>> I would like to get the discussion started around automatic code >>>> formatting + enforcing and how we get there. >>>> >>>> Currently we use Checkstyle check to enforce formatting. However, the >>>> problem with that is that you still have to manually do the actual >>>> formatting. >>>> >>>> What I would like to propose is the usage of Spotless >>>> (https://github.com/diffplug/spotless >>>> <https://github.com/diffplug/spotless>) for checking and enforcing Java >>>> code style (it can also enforce code style for Scala, Markdown, ... btw). >>>> Spotless is being used by many projects >>>> (https://github.com/search?l=gradle&q=spotless&type=Code >>>> <https://github.com/search?l=gradle&q=spotless&type=Code>) and comes >>>> essentially with two tasks: >>>> * spotlessCheck: Checks that sourcecode satisfies formatting steps >>>> * spotlessApply: Applies code formatting steps to sourcecode in-place >>>> >>>> >>>> Code format >>>> >>>> The problem with code format is that there is no single format that can >>>> satisfy the preferences of everybody. However, from my experience, once >>>> people start to use any code format that produces consistent results >>>> across Eclipse/IntelliJ/cmd line, people stop worrying about code format >>>> details. >>>> This is also one of the reasons why the creators of Go decided to have a >>>> code formatter built-in (https://go.dev/doc/effective_go#formatting >>>> <https://go.dev/doc/effective_go#formatting>): >>>> >>>> Formatting issues are the most contentious but the least consequential. >>>> People can adapt to different formatting styles but it's better if they >>>> don't have to, and less time is devoted to the topic if everyone adheres >>>> to the same style. The problem is how to approach this Utopia without a >>>> long prescriptive style guide. >>>> With Go we take an unusual approach and let the machine take care of most >>>> formatting issues. The gofmt program (also available as go fmt, which >>>> operates at the package level rather than source file level) reads a Go >>>> program and emits the source in a standard style of indentation and >>>> vertical alignment, retaining and if necessary reformatting comments. >>>> >>>> >>>> I would like to propose using the Google Java Format with Spotless. The >>>> reason for this format is essentially that this is a widely-adopted code >>>> format that is designed specifically for code reviews (since we're >>>> spending more time reviewing code than writing it). >>>> Additionally, it produces consistent formatting results across >>>> Eclipse/IntelliJ/cmd line, which I think is another very important factor. >>>> >>>> Thus, our initial Gradle spotless configuration could look similar to the >>>> above below: >>>> >>>> >>>> pluginManager.withPlugin('com.diffplug.spotless') { >>>> spotless { >>>> // don't run spotlessCheck during gradle check task during the >>>> transition phase >>>> enforceCheck = false >>>> java { >>>> target 'src/main/java/**/*.java', 'src/test/java/**/*.java', >>>> 'src/jmh/java/**/*.java' >>>> googleJavaFormat() >>>> } >>>> } >>>> } >>>> >>>> We don't have to use Google Java Format. Spotless also supports formatting >>>> the code with other formats, but from previous experience the Google Java >>>> Format seemed to be really the only one to produce consistent results >>>> across Eclipse/IntelliJ/cmd line. >>>> >>>> >>>> How do we get to a point where the entire codebase is properly formatted >>>> (and enforceCheck = false can be removed)? >>>> >>>> Now this is a difficult question. Obviously we don't want to have a single >>>> format-everything commit, as that would affect lots of in-flight PRs. >>>> >>>> There would have to be some form of gradual formatting, for example module >>>> by module. Spotless offers something called Ratched >>>> (https://github.com/diffplug/spotless/tree/main/plugin-gradle#ratchet >>>> <https://github.com/diffplug/spotless/tree/main/plugin-gradle#ratchet>) >>>> that allows to enforce code format gradually (but I'm not sure this would >>>> be a good thing either). >>>> >>>> How exactly we'd like to approach this transitioning phase this is a >>>> completely separate discussion, but I feel like at least we could get the >>>> ball rolling so that we make it also easier for newcomers to contribute to >>>> the project, since it would be straightforward for them to make their PRs >>>> adhere to the code format and also save time during PR reviews. >>>> >>>> >>>> Eduard >>>> >>>> >>>> -- >>>> Ryan Blue >>>> Tabular >>> >>> >>> >>> -- >>> Ryan Blue >>> >>> >>> -- >>> Ryan Blue >>> Tabular >>> >>> >>> -- >>> Ryan Blue >>> Tabular >> >