Hi all,

Thanks a lot Claude for the proposal! I'm not that familiar with the
project yet, but from an "newbie" perspective it all makes a lot of
sense.

Keeping 0.x (0.17/18) alive until 1.0.0 and the "1.0.0-feature" idea
is a good approach.

*code cleanup*

> RAT-478 – Drop Java8 build support. (1.0.0)

I _think_ it's okay to go directly to Java 17.


*Pluggability*

> RAT-352 – define a .ratignore file and processor (0.18)

That would be an awesome feature and helps keep the root build file clean.

> RAT-163 – Add Gradle Plugin (0.18)

The PR against the master branch (1.0.0) is pretty close, but somewhat
depends on RAT-478 (drop Java 8). It's not strictly necessary to build
the Gradle plugin for Java 17, it's rather a personal habit. Need to
look into a variant for 0.18.

> Pluggable scanners

A couple months ago I started to look into all-the-things and
maintenance of LICENSE/NOTICE files, focusing on source code (code
that's been copied w/ a different license/(C), and/or requiring a
mention in NOTICE) and dependencies. As part of that I had to detect
licenses from arbitrary text files and got it working to produce
correct "full blown" SPDX license expressions from any license file in
a fast way (other tools & implementations were slow or couldn't detect
all SPDX licenses+exceptions). The detection part is pretty much
isolated and I could extend it to also detect licenses from the file
headers, if that could be of use for RAT.

With the whole approach I tried to detect and infer the "right"
LICENSE & NOTICE content to generate those files for individual
project modules and binary distributables (think: tarball/zip +
container images). Generating those files wasn't the issue. The issue
was that it's pretty hard to correctly detect the right license and
the right NOTICE. Information in Maven poms is very often incomplete
and also wrong (license and especially scm information). And inferring
that information from source code repos is difficult as there are many
projects with "interesting" relationships of published dependencies
and their code repos.

While SBOMs (SPDX + CycloneDX) have dedicated metadata for the
license, neither supports the NOTICE file - at least I could not
figure out how.
I think, if there would be a machine-readable way to determine the
license, including the (C)-attribution mandatory for some licenses,
and NOTICE, it would be pretty straight forward to generate those
files for all use cases.

Honestly, I think it's worth aiming for a more "standardized" way of
generating LICENSE and NOTICE files, relying on a structured input -
maybe some SBOM.
NOTICE files and the additions to LICENSE files are all free text,
parsing might be difficult. But it's a big project on its own.



*General*

> RAT-25 – add checksum and similar checks on final build products.

I think this relates a lot to how Apache Trusted Releases changes the
release process. At least currently, verifying checksums + signatures
is a quite manual and error-prone process (typos/PEBKAC). For the
Polaris project, I've created a script [1] to verify the signatures
and checksums of the release artifacts, plus some project specific
things.


Best,
Robert

[1] https://github.com/apache/polaris/pull/2824

On Sat, Nov 1, 2025 at 10:54 AM Claude Warren <[email protected]> wrote:
>
> *Design proposal*
>
>
> *Introduction*
>
>
> RAT 0.17 was just released and we have several issues newly opened. In
> addition, there are several older issues that will require large scale
> changes. This proposal is intended to open discussion concerning the
> direction of development for RAT, and how to address the general categories
> of issues.
>
>
> The changes cover a couple of broad areas. Some changes are fairly simple
> and can be easily implemented against the 0.17 codebase. Others require a
> much deeper change.
>
>
> I believe that we should maintain two working branches for a short time:
> “0.18-SNAPSHOT” so that bug fixes can be released, and 1.0.0-FEATURE on
> which to do the longer term development. This support of two branches will
> only be maintained until 1.0.0 is released.
>
>
> I have listed the general broad areas below as well as what version (0.18,
> 1.0.0, 1.0.0+) I think the changes should be made on. 1.0.0+ indicates
> 1.0.0 or later.
>
>
> Since access to Jira is restricted, I suggest that discussions about these
> components be carried out on the mailing list, with final decisions posted
> into Jira. If anyone has a better solution to allow people without Jira
> access to contribute please speak up.
>
>
> *Code Cleanup*
>
>
> The code base contains a number of bad practices that have been identified
> but were deemed not suitable for 0.17 release. These include
>
>
>
>    -
>
>    RAT-465 – remove deprecated code (1.0.0)
>    -
>
>    RAT-498 – Dependency Updates (1.0.0)
>    -
>
>    RAT-442 – migrate to Commons-cli 1.10.0 (0.18)
>    -
>
>    RAT-452 – remove ExtendedIterator when Commons-collection 4.5.0 is
>    available. (0.18)
>    -
>
>    RAT-449 – remove Plexus code when 4.0.3 or upstream contributions are
>    added. (0.18)
>    -
>
>    RAT-350 – remove javadoc errors (1.0.0)
>    -
>
>    RAT-478 – Drop Java8 build support. (1.0.0)
>    -
>
>    RAT-443 – remove package cycles (1.0.0)
>    -
>
>    RAT-404 – create SAXON Xslt processor. (0.18)
>    -
>
>    RAT-403 – remove AbstractReport (1.0.0)
>    -
>
>    RAT-459 – cleanup report generation (1.0.0)
>    -
>
>    RAT-401 – rework IdocumentAnalyzer and RatReport into single class
>    (1.0.0)
>    -
>
>    RAT-407 – move HeaderCheckerWorker code into documentHeaderAnalyzer.
>    (1.0.0)
>
>
> *Pluggable Architecture*
>
>
> We have moved to a mostly pluggable architecture. There are some additional
> components for and fixes to the current architecture.
>
>
>
>    -
>
>    Pluggable Include/Exclude processors
>    -
>
>       RAT-352 – define a .ratignore file and processor (0.18)
>       -
>
>    Pluggable UI
>    -
>
>       RAT-163 – Add Gradle Plugin (0.18)
>       -
>
>       Clean up Maven UI
>       -
>
>          RAT-508 – RAT logs too much at INFO level in Maven. (0.18)
>
>    -
>
>    Pluggable scanners: Currently RAT scans for license headers. This is a
>    scanner capability. The addition of new scanners will mean that the
>    include/exclude processing will need to be executed before each processor
>    and then the processor executed. The processor will need to be able to
>    report into a common metadata structure that can be used to generate a
>    common XML. This common processing framework will need to be developed.
>    However, there are points in the current RAT code where new processors
>    could be added. The first new processor will bring with it a lot of newish
>    code. Many of the processors specified below could be implemented as
>    adapters to other libraries.
>    -
>
>       RAT-461 – create a notice processor (1.0.0+)
>       -
>
>          RAT-388 – replace hardcoded NOTICE category (1.0.0+)
>          -
>
>       RAT-51 – check for crypto jar inclusion (1.0.0+)
>       -
>
>       RAT-45 – copy and replace detection tool (1.0.0+)
>       -
>
>       RAT-25 – add checksum and similar checks on final build products.
>       (1.0.0+)
>       -
>
>       RAT-389 – move Add header functionality into a separate tool (1.0.0)
>       -
>
>       RAT-4 – check known license requirements for code against notice
>       files. (1.0.0+)
>       -
>
>          RAT-334 – option to verify notice file. (1.0.0+)
>          -
>
>       RAT-5 – check copyright dates.(1.0.0+)
>
>    -
>
>    Plugable editors: RAT currently has one editor; it can add headers to
>    some file types. This change is to move editors into their own category of
>    pluggable components. This will allow users to easily prohibit updating any
>    files within the RAT run, by turning off all editor components at once.
>    Otherwise, these tools will probably use much of the pluggable scanners
>    framework.
>    -
>
>       RAT-463 – rework header editing. (1.0.0)
>       -
>
>       RAT-351 – automatic code formatters (1.0.0+)
>
>
> *General*
>
>
> These are items that we need to fix but did not fit into one of the other
> categories. Unless otherwise noted these are to be fixed in 0.18.
>
>
>
>    -
>
>    RAT-462 – scan images for licensing information.
>    -
>
>    RAT-508 – RAT logs too much at INFO level in Maven.
>    -
>
>    RAT-476 – Include/Exclude precedence
>    -
>
>    RAT-509 – gitignore rules mixed with RAT include/exclude rules.
>    -
>
>    RAT-395 – Add HTML Report
>    -
>
>    RAT-512 – PDF Files fail the check.
>    -
>
>    RAT-511 – some tar.gz files not read.
>
>
>    -
>
>    RAT-460 – Add additional license definitions

Reply via email to