Hi all, Thanks a lot Claude for the proposal! I'm not that familiar with the project yet, but from an "newbie" perspective it all makes a lot of sense.
Keeping 0.x (0.17/18) alive until 1.0.0 and the "1.0.0-feature" idea is a good approach. *code cleanup* > RAT-478 – Drop Java8 build support. (1.0.0) I _think_ it's okay to go directly to Java 17. *Pluggability* > RAT-352 – define a .ratignore file and processor (0.18) That would be an awesome feature and helps keep the root build file clean. > RAT-163 – Add Gradle Plugin (0.18) The PR against the master branch (1.0.0) is pretty close, but somewhat depends on RAT-478 (drop Java 8). It's not strictly necessary to build the Gradle plugin for Java 17, it's rather a personal habit. Need to look into a variant for 0.18. > Pluggable scanners A couple months ago I started to look into all-the-things and maintenance of LICENSE/NOTICE files, focusing on source code (code that's been copied w/ a different license/(C), and/or requiring a mention in NOTICE) and dependencies. As part of that I had to detect licenses from arbitrary text files and got it working to produce correct "full blown" SPDX license expressions from any license file in a fast way (other tools & implementations were slow or couldn't detect all SPDX licenses+exceptions). The detection part is pretty much isolated and I could extend it to also detect licenses from the file headers, if that could be of use for RAT. With the whole approach I tried to detect and infer the "right" LICENSE & NOTICE content to generate those files for individual project modules and binary distributables (think: tarball/zip + container images). Generating those files wasn't the issue. The issue was that it's pretty hard to correctly detect the right license and the right NOTICE. Information in Maven poms is very often incomplete and also wrong (license and especially scm information). And inferring that information from source code repos is difficult as there are many projects with "interesting" relationships of published dependencies and their code repos. While SBOMs (SPDX + CycloneDX) have dedicated metadata for the license, neither supports the NOTICE file - at least I could not figure out how. I think, if there would be a machine-readable way to determine the license, including the (C)-attribution mandatory for some licenses, and NOTICE, it would be pretty straight forward to generate those files for all use cases. Honestly, I think it's worth aiming for a more "standardized" way of generating LICENSE and NOTICE files, relying on a structured input - maybe some SBOM. NOTICE files and the additions to LICENSE files are all free text, parsing might be difficult. But it's a big project on its own. *General* > RAT-25 – add checksum and similar checks on final build products. I think this relates a lot to how Apache Trusted Releases changes the release process. At least currently, verifying checksums + signatures is a quite manual and error-prone process (typos/PEBKAC). For the Polaris project, I've created a script [1] to verify the signatures and checksums of the release artifacts, plus some project specific things. Best, Robert [1] https://github.com/apache/polaris/pull/2824 On Sat, Nov 1, 2025 at 10:54 AM Claude Warren <[email protected]> wrote: > > *Design proposal* > > > *Introduction* > > > RAT 0.17 was just released and we have several issues newly opened. In > addition, there are several older issues that will require large scale > changes. This proposal is intended to open discussion concerning the > direction of development for RAT, and how to address the general categories > of issues. > > > The changes cover a couple of broad areas. Some changes are fairly simple > and can be easily implemented against the 0.17 codebase. Others require a > much deeper change. > > > I believe that we should maintain two working branches for a short time: > “0.18-SNAPSHOT” so that bug fixes can be released, and 1.0.0-FEATURE on > which to do the longer term development. This support of two branches will > only be maintained until 1.0.0 is released. > > > I have listed the general broad areas below as well as what version (0.18, > 1.0.0, 1.0.0+) I think the changes should be made on. 1.0.0+ indicates > 1.0.0 or later. > > > Since access to Jira is restricted, I suggest that discussions about these > components be carried out on the mailing list, with final decisions posted > into Jira. If anyone has a better solution to allow people without Jira > access to contribute please speak up. > > > *Code Cleanup* > > > The code base contains a number of bad practices that have been identified > but were deemed not suitable for 0.17 release. These include > > > > - > > RAT-465 – remove deprecated code (1.0.0) > - > > RAT-498 – Dependency Updates (1.0.0) > - > > RAT-442 – migrate to Commons-cli 1.10.0 (0.18) > - > > RAT-452 – remove ExtendedIterator when Commons-collection 4.5.0 is > available. (0.18) > - > > RAT-449 – remove Plexus code when 4.0.3 or upstream contributions are > added. (0.18) > - > > RAT-350 – remove javadoc errors (1.0.0) > - > > RAT-478 – Drop Java8 build support. (1.0.0) > - > > RAT-443 – remove package cycles (1.0.0) > - > > RAT-404 – create SAXON Xslt processor. (0.18) > - > > RAT-403 – remove AbstractReport (1.0.0) > - > > RAT-459 – cleanup report generation (1.0.0) > - > > RAT-401 – rework IdocumentAnalyzer and RatReport into single class > (1.0.0) > - > > RAT-407 – move HeaderCheckerWorker code into documentHeaderAnalyzer. > (1.0.0) > > > *Pluggable Architecture* > > > We have moved to a mostly pluggable architecture. There are some additional > components for and fixes to the current architecture. > > > > - > > Pluggable Include/Exclude processors > - > > RAT-352 – define a .ratignore file and processor (0.18) > - > > Pluggable UI > - > > RAT-163 – Add Gradle Plugin (0.18) > - > > Clean up Maven UI > - > > RAT-508 – RAT logs too much at INFO level in Maven. (0.18) > > - > > Pluggable scanners: Currently RAT scans for license headers. This is a > scanner capability. The addition of new scanners will mean that the > include/exclude processing will need to be executed before each processor > and then the processor executed. The processor will need to be able to > report into a common metadata structure that can be used to generate a > common XML. This common processing framework will need to be developed. > However, there are points in the current RAT code where new processors > could be added. The first new processor will bring with it a lot of newish > code. Many of the processors specified below could be implemented as > adapters to other libraries. > - > > RAT-461 – create a notice processor (1.0.0+) > - > > RAT-388 – replace hardcoded NOTICE category (1.0.0+) > - > > RAT-51 – check for crypto jar inclusion (1.0.0+) > - > > RAT-45 – copy and replace detection tool (1.0.0+) > - > > RAT-25 – add checksum and similar checks on final build products. > (1.0.0+) > - > > RAT-389 – move Add header functionality into a separate tool (1.0.0) > - > > RAT-4 – check known license requirements for code against notice > files. (1.0.0+) > - > > RAT-334 – option to verify notice file. (1.0.0+) > - > > RAT-5 – check copyright dates.(1.0.0+) > > - > > Plugable editors: RAT currently has one editor; it can add headers to > some file types. This change is to move editors into their own category of > pluggable components. This will allow users to easily prohibit updating any > files within the RAT run, by turning off all editor components at once. > Otherwise, these tools will probably use much of the pluggable scanners > framework. > - > > RAT-463 – rework header editing. (1.0.0) > - > > RAT-351 – automatic code formatters (1.0.0+) > > > *General* > > > These are items that we need to fix but did not fit into one of the other > categories. Unless otherwise noted these are to be fixed in 0.18. > > > > - > > RAT-462 – scan images for licensing information. > - > > RAT-508 – RAT logs too much at INFO level in Maven. > - > > RAT-476 – Include/Exclude precedence > - > > RAT-509 – gitignore rules mixed with RAT include/exclude rules. > - > > RAT-395 – Add HTML Report > - > > RAT-512 – PDF Files fail the check. > - > > RAT-511 – some tar.gz files not read. > > > - > > RAT-460 – Add additional license definitions
