On Sun, Apr 9, 2017 at 9:20 PM, Luis Villa <l...@lu.is> wrote: > What's the "right" level to scan at? Top-level project-declared LICENSE > file? Or per-file throughout the tree? (Note that often those two measures > don't agree with each other.)
MO is that the right level is scan at both levels and if needed surface any inconsistencies or contradictions. Scanning only the simpler top-level project-declared LICENSE or COPYING file is not enough and too often incomplete or inaccurate data based on my experience at scale. That said, I am the maintainer of the open source ScanCode toolkit, a fresh take to build a better mousetrap for license scanning: https://github.com/nexB/scancode-toolkit My goal is simple: I want the licensing of every open source code to be a problem solved. Not a question mark. e.g. working towards 100% licensing clarity and eventually ensure that no piece of existing open source code raises questions wrt. licensing to a user or aspiring user. For that I would like to scan it **all**... and setup some community peer review site so we can help every open source project add, refine or cleanup any missing, incomplete, inaccurate or contradicting licensing. Or at least make the data open and available for anyone to query otherwise. The main drag is as always resource availability (as in both human time, network , bandwidth and computing power) to fetch and scan everything from every package managers, forge, Sourceforge, Github, etc which represents a significant[sic] number of terabytes. This could become a lesser issue on the fetch side when softwareheritage.org is fully operational. But still. If anyone is interested by this, please contact me! -- Cordially Philippe Ombredanne _______________________________________________ License-discuss mailing list License-discuss@opensource.org https://lists.opensource.org/cgi-bin/mailman/listinfo/license-discuss