Hi Christian,
On Apr 3, 2014, at 2:24 AM, Christian Mueller <[email protected]>
wrote:
> Hi,
>
> I’ve got some short questions about the NOMOS scan engine of version 2.40,
> because I’m not quite sure if I did understand the general functionality
> correctly (I’m not a C expert):
>
> - The NOMOS engine uses a lot of if-then-else-statements to check
> whether license-texts / -references or similar are included in a file and
> marks findings accordingly, correct? -> seen in file parse.c
Yes, correct.
> - These license-texts /-references are defined somewhere in the
> FOSSY database, correct? -> is it the license_ref-table?
No. The license_ref table is only so we can display a canonical license or
license reference text. It has nothing to do with the license scanning.
> - Is it possible to easily add new license (references) just by
> adding new search pattern texts to this table – WITHOUT adaption of the code?
Unfortunately, no. The scanner is completely implemented in C.
> - Does the NOMOS engine only scan for “standard OSS licenses” – are
> legal relevant phrases scanned by a different agent? (-> regexscan)?
Nomos will scan for standard OSS license (all the licenses from OSI and SPDX)
and much more. Nomos will even attempt to find licenses that it doesn’t know
about (reporting them as “Unclassified License”). The list of licenses nomos
can report is found on:
http://www.fossology.org/projects/fossology/wiki/Nomos_license_list
> - Is it possible to easily add new search phrases to Fossology (e.g.
> add new search pattern to a table in the database) – WITHOUT adaption of the
> code?
See above. Unfortunately not. One of these days I would like to create a new
generation of the license scanner that would be data driven. But currently
this is not the case. However, we regularly add in licenses that people
submit. So if you want a new license to be part of Nomos, you can send it to
us for inclusion. Frequently, people just send us patches, but since you
aren’t a C programmer, you can just send us:
1) URL to the canonical license. This is usually a web page from the people
that created the license.
2) A test file that uses the license.
> I’m currently researching the functionality of different OSS scan engines
> for a possible project solution.
If you are only looking at open source solutions you should look at Ninka from
our friend Daniel German:
http://ninka.turingmachine.org/
However, Ninka only does license scanning (not copyrights, not buckets, no UI,
no database).
A few years ago Daniel and I were talking and I told him about a sentence based
machine learning algorithm we were working on. He liked the idea and ran with
the concept to produce Ninka. His (and Yuki Manabe’s) work gives good results
on source code. FWIW, our machine learning version sucked and we abandoned it.
In my ideal world, I’d combine both Ninka with a data driven Nomos for a new
license scanner.
Good luck,
Bob Gobeille
_______________________________________________
fossology mailing list
[email protected]
http://lists.fossology.org/mailman/listinfo/fossology