[
https://issues.apache.org/jira/browse/MPIR-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843539#comment-16843539
]
Gary O'Neall commented on MPIR-382:
-----------------------------------
A couple suggestions from a member of the SPDX team - Using SPDX License ID's
is an unambiguous ID you can use for licenses. There are a few ways to
determine if licenses match a particular license ID. The name is one of the
more unreliable means - so I would avoid using the license name from the POM
file.
By way of background, the SPDX organization maintains a [curated list of open
source licenses|[https://spdx.org/licenses/]] along with [matching
guidelines|[https://spdx.org/spdx-license-list/matching-guidelines]] and [tool
libraries|[https://github.com/spdx/tools]] to solve issue like normalizing the
license names as discussed in this issue.
The most reliable method would be to match against the actual text using the
matching guidelines. There is a [Java library
class|[https://github.com/spdx/tools/blob/master/src/org/spdx/compare/LicenseCompareHelper.java]]
which can help facilitate the matching, but it is not a high performance
utility and it requires the actual text (e.g. any HTML would need to be scraped
and the actual license text extracted).
The approach we took in the [SPDX Maven
Plugin|[https://github.com/spdx/spdx-maven-plugin]] was use the URL's for the
license to match. The code for this can be found in the [LicenseManager
class|[https://github.com/spdx/spdx-maven-plugin/blob/master/src/main/java/org/spdx/maven/LicenseManager.java]].
Feel free to leverage code from the libraries. The code itself uses the
Apache-2.0 license and there are some dependencies which use Apache friendly
licenses.
You can also post questions to the [SPDX tech mailing
list|[https://lists.spdx.org/g/spdx-tech]] or submit issues against the license
list XML or tools libraries in the [SPDX git
repository|[https://github.com/spdx]].
> Group different license spellings together
> ------------------------------------------
>
> Key: MPIR-382
> URL: https://issues.apache.org/jira/browse/MPIR-382
> Project: Maven Project Info Reports Plugin
> Issue Type: Improvement
> Components: licenses
> Affects Versions: 3.0.0
> Reporter: Vincent Privat
> Priority: Major
>
> The licenses report is not aware of the different possible spellings we can
> find in all Maven dependencies published on Maven Central. This results in
> any given license (for example, Apache 2.0) being listed several times. For
> example, take a look to the seven variants listed in [maven-site-plugin
> dependencies|http://maven.apache.org/plugins/maven-site-plugin/dependencies.html#Licenses]:
> * Apache Public License 2.0
> * Apache License Version 2
> * Apache License, Version 2.0
> * Apache Software License - Version 2.0
> * Apache License 2.0
> * Apache License Version 2.0
> * The Apache Software License, Version 2.0
> On a large project, this makes the license report unreadable. The plugin
> should detect all these variants denote the same license and group them
> together.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)