I got back to looking at 366 and discovered a problem that I think has been lurking in the system for some time. Basically, if a file has the signatures for more than one license only one will be reported, and the selection of which one is (I think) random.
The CI build for RAT-366 shows a test failing on a POM file having the wrong license BSD rather than AS. The issue arises because the POM file defines a license that is in the BSD family. Since the POM defines it, it has the signatures for both it and the ASF license which is in the header of the POM. On my system it reports as AS but on the CI it reports as BSD. I think the only reasonable thing to do is report both (there is a comment in the code somewhere that this should be accounted for). The only other reasonable option would be to attempt to figure out which one appears first in the file. But this gets very complex because we have two versions of the file text: raw (as read from the file), and pruned (where anything that is not a number or letter is removed). In addition we have aggregate matchers. My suggestion is we report all license matches and let the user decide what to do. My plan is to create a branch that reports multiple matching licenses and then merge that into RAT-366 to resolve the problem. This should give us all a chance to review the change before it gets added to the already large RAT-366.
