I changed the standard report slightly. It now looks like:
s src/test/resources/elements/ILoggerFactory.java
MIT MIT The MIT License
b src/test/resources/elements/Image.png
n src/test/resources/elements/LICENSE
n src/test/resources/elements/NOTICE
!s src/test/resources/elements/Source.java
????? ????? Unknown license
s src/test/resources/elements/Text.txt
AL AL Apache License Version 2.0
s src/test/resources/elements/TextHttps.txt
AL AL Apache License Version 2.0
s src/test/resources/elements/Xml.xml
AL AL Apache License Version 2.0
s src/test/resources/elements/buildr.rb
AL AL Apache License Version 2.0
a src/test/resources/elements/dummy.jar
g src/test/resources/elements/generated.txt
GEN GEN Generated Files
b src/test/resources/elements/plain.json
s src/test/resources/elements/tri.txt
AL AL Apache License Version 2.0
BSD-3 BSD-3 BSD 3 clause
BSD-3 TMF The Telemanagement Forum License
!s src/test/resources/elements/sub/Empty.txt
????? ????? Unknown license
*****************************************************
On Fri, Mar 29, 2024 at 3:50 PM Claude Warren <[email protected]> wrote:
> I have a proposed change. See
> https://github.com/Claudenw/creadur-rat/pull/6/files
> Note that this pull request is the difference between multiple targets and
> the change to move to RAT-366 (Move to single matche call)
>
> Example output in
> https://github.com/Claudenw/creadur-rat/tree/Multiple_license_report/apache-rat/src/site/examples
>
> I reworked the MetaData class and removed all the funky naming. All we
> really needed to capture for a document is what licenses matched and which
> of those are approved licenses.
>
> The new rat report (in examples) has a "resource" element for each file
> that was checked. The resource still has a name attribute and I added a
> type attribute that specifies the type of file that it is (e.g. archive,
> standard, binary). It has two possible child elements "license" and
> "sample"
>
> The license element has several attributes: approval, family, id, and name
> A license can have a notes child element that contains the notes for the
> license. These are not usually displayed but are included for the
> generated files license.
>
> The sample element contains text from the license. It is only included
> when the license type is unknown.
>
> The sample and notes text is enclosed in a CDATA block.
>
> I reworked the standard report. This is probably a breaking change for
> anyone who is parsing the text, but then they should be using a custom xslt
> to extract the info they want.
>
> The new report looks like:
>
>
> *****************************************************
> Summary
> -------
> Generated at: 2024-03-29T15:01:24+01:00
>
> Notes: 2
> Binaries: 2
> Archives: 1
> Standards: 8
>
> Apache Licensed: 5
> Generated Documents: 1
>
> JavaDocs are generated, thus a license header is optional.
> Generated files do not require license headers.
>
> 2 Unknown Licenses
>
> *****************************************************
>
> Files with unapproved licenses:
>
> src/test/resources/elements/Source.java
> src/test/resources/elements/sub/Empty.txt
>
> *****************************************************
>
> *****************************************************
> Documents with unapproved licenses will start with a '!'
> The next character identifies the document type.
>
> char type
> a Archive file
> b Binary file
> g Generated file
> n Notice file
> s Standard file
> u Unknown file.
>
> s src/test/resources/elements/ILoggerFactory.java
> MIT The MIT License
> b src/test/resources/elements/Image.png
> n src/test/resources/elements/LICENSE
> n src/test/resources/elements/NOTICE
> !s src/test/resources/elements/Source.java
> ????? Unknown license
> s src/test/resources/elements/Text.txt
> AL Apache License Version 2.0
> s src/test/resources/elements/TextHttps.txt
> AL Apache License Version 2.0
> s src/test/resources/elements/Xml.xml
> AL Apache License Version 2.0
> s src/test/resources/elements/buildr.rb
> AL Apache License Version 2.0
> a src/test/resources/elements/dummy.jar
> g src/test/resources/elements/generated.txt
> GEN Generated Files
> b src/test/resources/elements/plain.json
> s src/test/resources/elements/tri.txt
> AL Apache License Version 2.0
> BSD-3 BSD 3 clause
> TMF The Telemanagement Forum License
> !s src/test/resources/elements/sub/Empty.txt
> ????? Unknown license
>
> *****************************************************
>
> I think this solves the problem.
>
> Claude
>
> On Thu, Mar 28, 2024 at 10:17 AM Claude Warren <[email protected]> wrote:
>
>> SPDX[1] has an interesting format where they can report 2 (or more?)
>> licenses in one.
>>
>> There are a couple of things here that we will need to look at:
>>
>> 1. Metadata only stores one matching license.
>> 2. Can we modify the output XML to list multiple licenses for a file
>> without too much trouble. I don't think the existing XLST will
>> have problems with it.
>> 3. SPDX [1] has an interesting format where they can report 2 (or
>> more?) licenses in one. Perhaps we should use their format for license
>> identification. This would allow us to report the SPDX tags that
>> reference
>> multiple licenses.
>>
>> Also, everytime I look at the LicenseFamily code I wonder why there is a
>> limit of 5 on the number of characters in the license family category. It
>> feels like a formatting issue was pushed into the internal code. Drives me
>> crazy.
>>
>> [1] https://spdx.dev/learn/handling-license-info/
>>
>> On Thu, Mar 28, 2024 at 10:01 AM P. Ottlinger <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> Am 28.03.24 um 09:41 schrieb Claude Warren:
>>> > I got back to looking at 366 and discovered a problem that I think has
>>> been
>>> > lurking in the system for some time. Basically, if a file has the
>>> > signatures for more than one license only one will be reported, and the
>>> > selection of which one is (I think) random.
>>>
>>> thanks for analyzing this issue, which explains some random test
>>> failuress ..... :(
>>>
>>> <snip>
>>>
>>> > My suggestion is we report all license matches and let the user decide
>>> what
>>> > to do.
>>>
>>> I'm in favour of reporting as many licenses as possible, but assume this
>>> will break the current report format, that is optimized for one license
>>> only.
>>>
>>> Not sure if downstream users have problems with that change?!
>>>
>>> Would we have a maximum license number or could this result in an
>>> "endless" list of reported licenses, if a file with "all" thinkable
>>> license files is provided to RAT? Initially I thought of adding a new
>>> analyzer/reporting state "MULTIPLE" that is reported in the scan and a
>>> detailed report that lists up to x (maybe 3 or 5?) maximum licenses per
>>> file - WDYT?
>>>
>>> >
>>> > My plan is to create a branch that reports multiple matching licenses
>>> and
>>> > then merge that into RAT-366 to resolve the problem. This should give
>>> us
>>> > all a chance to review the change before it gets added to the already
>>> large
>>> > RAT-366.
>>>
>>> +1
>>>
>>> Thanks for your deep dive into RAT!
>>>
>>> Cheers,
>>> Phil
>>>
>>
>>
>> --
>> LinkedIn: http://www.linkedin.com/in/claudewarren
>>
>
>
> --
> LinkedIn: http://www.linkedin.com/in/claudewarren
>
--
LinkedIn: http://www.linkedin.com/in/claudewarren