On Tue, Jul 21, 2015 at 07:31:07PM +0200, Dominique Dumont wrote:
> Currently, licensecheck -r uses find to scan a directory and accepts files 
> based on their suffix (i.e. accepts .c .h .cxx ...)
> 
> This list of suffixes is a big regexp that must be updated regurlarly. Still 
> some files are missed like config.guess.
> 
> Maintaining this regexp is not efficient.

True.  The current mechanism is tedious and either requires intervention
from the user are requesting updates to the regexp.

> I'd like to propose to use 'file' command to decide whether to scan a file or 
> not. All files of mime type 'text/*' and 'application/xml' would be scanned.
> (Note that file is already used to find the charset of each scanned file.)

If I understand correctly, you're proposing something like the attached,
untested patch.  Keep the -c switch so users can still filter by
extension if they like, but by default find all files and filter based
on file type.

> This way, I think that licensecheck would miss less source files.

Sounds good to me.

Cheers,
-- 
James
GPG Key: 4096R/331BA3DB 2011-12-05 James McCoy <[email protected]>
diff --git i/scripts/licensecheck.pl w/scripts/licensecheck.pl
index bac3353..d92def9 100755
--- i/scripts/licensecheck.pl
+++ w/scripts/licensecheck.pl
@@ -168,9 +168,6 @@ my $default_ignore_regex = qr!
 \.shelf|_MTN|\.bzr(?:\.backup|tags)?)(?:$|/.*$)
 !x;
 
-my $default_check_regex = '\.(c(c|pp|xx)?|h(h|pp|xx)?|S|f(77|90)?|go|groovy|scala|clj|p(l|m)|xs|sh|php|py(|x)|rb|java|js|vala|el|sc(i|e)|cs|pas|inc|dtd|xsl|mod|m|tex|mli?|(c|l)?hs)$';
-
-
 # also used to cleanup
 my $copyright_indicator_regex
     = qr!
@@ -205,7 +202,7 @@ my %OPT=(
     lines          => '',
     noconf         => '',
     ignore         => '',
-    check          => '',
+    check          => '^',
     recursive      => 0,
     copyright      => 0,
     machine        => 0,
@@ -272,7 +269,6 @@ GetOptions(\%OPT,
 
 $OPT{'lines'} = $def_lines if $OPT{'lines'} !~ /^[1-9][0-9]*$/;
 my $ignore_regex = length($OPT{ignore}) ? qr/$OPT{ignore}/ : $default_ignore_regex;
-$OPT{'check'} = $default_check_regex if ! length $OPT{'check'};
 my $check_regex = qr/$OPT{check}/;
 
 if ($OPT{'noconf'}) {
@@ -459,7 +455,7 @@ Valid options are:
                             (Default: $def_lines)
    --check, -c            Specify a pattern indicating which files should
                              be checked
-                             (Default: '$default_check_regex')
+                             (Default: all text files)
    --machine, -m          Display in a machine readable way (good for awk)
    --recursive, -r        Add the contents of directories recursively
    --copyright            Also display the file's copyright

Attachment: signature.asc
Description: Digital signature

_______________________________________________
devscripts-devel mailing list
[email protected]
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/devscripts-devel

Reply via email to