[
https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18040705#comment-18040705
]
Tim Allison edited comment on TIKA-4251 at 11/25/25 8:44 PM:
-------------------------------------------------------------
[~tilman] , I tried to modify the eclipse format so that it made no changes,
with the hopes that we could modify a line at a time to get the incremental,
format-specific changes.
Claude thought that'd work... but it was a disaster, which shocked absolutely
no one involved. :D
It is just not possible (at least with this tooling, which I'm not held to).
It finally came up with:
{noformat}
Option 1: Accept One Big Format Commit (Controversial but clean) Just pick
your target style (Google) and do it all at once:
- One commit: "chore: apply Google Java Style formatting"
- Everyone rebases/merges after
- Git blame issue: Use git blame --ignore-rev <commit-sha> or
.git-blame-ignore-revs Pro: Clean break, done once
Con: Big merge conflicts, disrupts in-flight work
Option 2: Format Only New/Modified Code (Gradual, least disruptive) <plugin>
<groupId>com.diffplug.spotless</groupId>
<artifactId>spotless-maven-plugin</artifactId>
<configuration>
<ratchetFrom>origin/main</ratchetFrom> <!-- Only format changed lines -->
<java>
<googleJavaFormat/>
</java>
</configuration>
</plugin> Pro: Zero disruption, gradual migration
Con: Codebase stays mixed for a long time
Option 3: Minimal Spotless - Just License + Imports (Pragmatic) Don't format
at all, just enforce:
<java>
<licenseHeader>...</licenseHeader>
<removeUnusedImports/>
<!-- NO FORMATTING -->
</java> Keep checkstyle for everything else. Add formatting later when
you're ready for the big commit. Pro: Immediate value, no disruption
Con: Doesn't solve the formatting consistency problem
{noformat}
Fourth option is module by module.
I wasn't aware of {{{}ratchetFrom{}}}, but that sounds like a pretty good
option because it would solve my personal frustrations of dealing with
checkstyle toe-stubbing on every PR, and, theoretically, it would eventually
cover the codebase, or at least the parts we care about and modify often?
On the other hand, that would pollute the PRs with format changes, because, I
think, that would alter entire files, not just the areas that were modified.
What makes sense?
was (Author: [email protected]):
[~tilman] , I tried to modify the eclipse format so that it made no changes,
with the hopes that we could modify a line at a time to get the incremental,
format-specific changes.
Claude thought that'd work... but it was a disaster, which shocked absolutely
no one involved. :D
It is just not possible (at least with this tooling, which I'm not held to).
It finally came up with:
{noformat}
Option 1: Accept One Big Format Commit (Controversial but clean) Just pick
your target style (Google) and do it all at once:
- One commit: "chore: apply Google Java Style formatting"
- Everyone rebases/merges after
- Git blame issue: Use git blame --ignore-rev <commit-sha> or
.git-blame-ignore-revs Pro: Clean break, done once
Con: Big merge conflicts, disrupts in-flight work
Option 2: Format Only New/Modified Code (Gradual, least disruptive) <plugin>
<groupId>com.diffplug.spotless</groupId>
<artifactId>spotless-maven-plugin</artifactId>
<configuration>
<ratchetFrom>origin/main</ratchetFrom> <!-- Only format changed lines -->
<java>
<googleJavaFormat/>
</java>
</configuration>
</plugin> Pro: Zero disruption, gradual migration
Con: Codebase stays mixed for a long time
Option 3: Minimal Spotless - Just License + Imports (Pragmatic) Don't format
at all, just enforce:
<java>
<licenseHeader>...</licenseHeader>
<removeUnusedImports/>
<!-- NO FORMATTING -->
</java> Keep checkstyle for everything else. Add formatting later when
you're ready for the big commit. Pro: Immediate value, no disruption
Con: Doesn't solve the formatting consistency problem
{noformat}
Fourth option is module by module.
I wasn't aware of {{{}ratchetFrom{}}}, but that sounds like a pretty good
option because it would solve my personal frustrations of dealing with
checkstyle toe-stubbing on every PR, and, theoretically, it would eventually
cover the codebase, or at least the parts we care about and modify often?
What makes sense?
> [DISCUSS] move to cosium's git-code-format-maven-plugin with
> google-java-format
> -------------------------------------------------------------------------------
>
> Key: TIKA-4251
> URL: https://issues.apache.org/jira/browse/TIKA-4251
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
> I was recently working a bit on incubator-stormcrawler, and I noticed that
> they are using cosium's git-code-format-maven-plugin:
> https://github.com/Cosium/git-code-format-maven-plugin
> I was initially annoyed that I couldn't quickly figure out what I had to fix
> to make the linter happyl, but then I realized there was a magic command:
> {{mvn git-code-format:format-code}} which just fixed the code so that the
> linter passed.
> The one drawback I found is that it does not fix nor does it alert on
> wildcard imports. We could still use checkstyle for that but only have one
> rule for checkstyle.
> The other drawback is that there is not a lot of room for variation from
> google's style. This may actually be a benefit, too, of course.
> I just ran this on {{tika-core}} here:
> https://github.com/apache/tika/tree/google-java-format
> What would you think about making this change for 3.x?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)