Hi Christian,

On Apr 3, 2014, at 2:24 AM, Christian Mueller <[email protected]> 
wrote:

> Hi,
>  
> I’ve got some short questions about the NOMOS scan engine of version 2.40, 
> because I’m not quite sure if I did understand the general functionality 
> correctly (I’m not a C expert):
>  
> -          The NOMOS engine uses a lot of if-then-else-statements to check 
> whether license-texts / -references or similar are included in a file and 
> marks findings accordingly, correct? -> seen in file parse.c

Yes, correct.

> -          These license-texts /-references are defined somewhere in the 
> FOSSY database, correct? -> is it the license_ref-table?

No.  The license_ref table is only so we can display a canonical license or 
license reference text.  It has nothing to do with the license scanning.

> -          Is it possible to easily add new license (references) just by 
> adding new search pattern texts to this table – WITHOUT adaption of the code?

Unfortunately, no.  The scanner is completely implemented in C.

> -          Does the NOMOS engine only scan for “standard OSS licenses” – are 
> legal relevant phrases scanned by a different agent? (-> regexscan)?

Nomos will scan for standard OSS license (all the licenses from OSI and SPDX) 
and much more.  Nomos will even attempt to find licenses that it doesn’t know 
about (reporting them as “Unclassified License”).  The list of licenses nomos 
can report is found on:

http://www.fossology.org/projects/fossology/wiki/Nomos_license_list


> -          Is it possible to easily add new search phrases to Fossology (e.g. 
> add new search pattern to a table in the database) – WITHOUT adaption of the 
> code?

See above.  Unfortunately not.  One of these days I would like to create a new 
generation of the license scanner that would be data driven.  But currently 
this is not the case.   However, we regularly add in licenses that people 
submit.  So if you want a new license to be part of Nomos, you can send it to 
us for inclusion.  Frequently, people just send us patches, but since you 
aren’t a C programmer, you can just send us:

1) URL to the canonical license.  This is usually a web page from the people 
that created the license.  
2) A test file that uses the license. 


>  I’m currently researching the functionality of different OSS scan engines 
> for a possible project solution.


If you are only looking at open source solutions you should look at Ninka from 
our friend Daniel German:

http://ninka.turingmachine.org/

However, Ninka only does license scanning (not copyrights, not buckets, no UI, 
no database).

A few years ago Daniel and I were talking and I told him about a sentence based 
machine learning algorithm we were working on.  He liked the idea and ran with 
the concept to produce Ninka.   His (and Yuki Manabe’s) work gives good results 
on source code.  FWIW, our machine learning version sucked and we abandoned it. 
 In my ideal world, I’d combine both Ninka with a data driven Nomos for a new 
license scanner.

Good luck,
Bob Gobeille
_______________________________________________
fossology mailing list
[email protected]
http://lists.fossology.org/mailman/listinfo/fossology

Reply via email to