Hi! Thanks a lot for the detailed explanation . I really appreciate you taking the time to provide the historical context after all these years, especially around how dependency management was handled back then. It definitely helps to understand the reasoning behind the script’s introduction. These days, we could have an automated process in place (similar to TomEE, StormCrawler, …):
Our CI pipeline could generate the updated license files during the build and automatically opens a PR against the main branch via GitHub Actions if they differ. This would ensure that the license information is always current. I’m guessing such tooling wasn’t available at the time, so the current approach made perfect sense back then. Best, Richard > Am 24.10.2025 um 18:22 schrieb Stig Rohde Døssing <[email protected]>: > > Hi Richard, > > Just want to provide a bit of context for why this script was added back in > the day (6 years ago or so), in case it helps you make a decision on what > to do about it today. > > Based on the advice at https://infra.apache.org/licensing-howto.html#binary, > and looking at a few other ASF projects (Kafka, Hadoop), the project needed > to maintain at least 4 files: > > LICENSE/NOTICE for Storm's source distribution > LICENSE/NOTICE-binary for Storm's binary distribution > > In addition, Storm at the time included some category B dependencies ( > https://www.apache.org/legal/resolved.html#category-b), and those are > required to be listed in a particular way that users are likely to notice ( > https://www.apache.org/legal/resolved.html#appropriately-labelled-condition). > Rather than make a listing of only category B dependencies, we added the > DEPENDENCY-LICENSES file listing all dependencies plus licenses, and added > a link to that file to the README. > > It was a bit of a pain to ensure that these files were up to date when > doing a release, it was very easy to forget to update the files when > adding/updating/removing a dependency, so I added the > validate-license-files.py script to ensure that PRs that updated > dependencies also kept these files up to date. At the time, dependency > bumps were done manually and infrequently. > > So it wasn't really about keeping category X licenses out (we'd catch that > in PR reviews even without these scripts), it was just about ensuring that > these files accurately reflected the dependencies we were actually > including in the distributions. Since dependency bumps were (comparatively) > rare and not automated, it was less effort at the time to ask PRs to keep > these files up to date as part of changing the dependencies, rather than > ask the people doing releases to validate the files later. > > Den tors. 23. okt. 2025 kl. 10.23 skrev Richard Zowalla <[email protected]>: > >> Hi, >> After reviewing validate-license-files.py, it seems we already generate >> the two license files, compare them with the existing ones, and fail the >> check if any differences are found. >> >> Currently, most of our PRs involve dependency updates, and each time we >> spend several cycles manually updating these files. >> >> I was wondering if we could adopt a similar approach to what we do in >> StormCrawler (see here): >> https://github.com/apache/stormcrawler/blob/main/.github/workflows/main.yml#L46 >> automatically generate the license files and open a PR whenever >> differences are detected. >> >> I assume the current license check was introduced to prevent accidentally >> introducing a category X license or similar issue. >> >> However, I think the time saved by automating these updates outweighs the >> minor additional review effort required during release preparation, since a >> full license review happens at that stage anyway. >> >> This goes in the direction of https://github.com/apache/storm/issues/7751 >> >> What do you think? >> >> Gruß >> Richard
