Package: lintian Version: 2.5.30+deb8u3 Severity: wishlist Tags: patch If a .html file is in a package then usually its <img> files should be in the package too so it displays nicely. I suggest the few lines below to check this.
Without picking on any particular maintainers, missing images can be found in for example * whizzytex where /usr/share/doc/whizzytex/whizzytex.html is missing whizzytex001.png (and two others) * texlive-pictures-doc (very big) where /usr/share/doc/texlive-doc/latex/mathspic/sourcecode113.html is missing a fig1.jpg deep in its detailed description I'm unsure if my code notices images supplied by dependent packages. I put a group bit like the manpages and symlinks checks, but I don't really understand when packages are a group. Eg. per html.pm comments, texlive-lang-french uses images from texlive-base and has a correct declared dependency, but I couldn't make the right incantation to have it recognised :-(. Incidentally HTML::Parser would be a more reliable html parse of course. But are lintian dependencies supposed to be kept down? I see another rough html parse in files.pm for privacy breaches. A good parse might help accuracy there against obscure quoting or escaping. I thought separate html.pm script to leave room for other checks related to html parse (whatever method). Maybe similar treatment of css or javascript (though I don't rate those), even some href checking. No full link checker, but detect document parts apparently missing from a package.
# html -- lintian check script # Copyright 2015 Kevin Ryde # # This program is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by the Free # Software Foundation; either version 2 of the License, or (at your option) # any later version. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with this program. If not go to <http://www.gnu.org/licenses/>. # ENHANCE-ME: snd-doc /usr/share/doc/snd-doc/HTML/manual/snd-contents.html # has a javascript chunk in the <head> which tricks the rough regexp below # into reporting src='". HTML::Parser could likely do a better job. # # ENHANCE-ME: texlive-lang-french # /usr/share/doc/texlive-doc/texlive/texlive-fr/texlive-fr.html has # src="../texlive-common/install-lnx-main.png" which is in its declared # dependency texlive-base but they're different source packages. Will they # show up in $ginfo->direct_dependencies($proc)? (If so then amend the note # in html.desc, if not then try something for arbitrary dependencies.) # package Lintian::html; use 5.010; use strict; use warnings; use Lintian::Tags qw(tag); use Lintian::Util qw(slurp_entire_file normalize_pkg_path); use File::Basename qw(fileparse); sub run { my (undef, undef, $info, $proc, $group) = @_; # Read each HTML file in the package... foreach my $file ($info->sorted_index) { next unless $file =~ /\.html?$/i && $file->is_file; my ($basename, $dirname) = fileparse($file); my $str = $file->file_contents; while ($str =~ /<([^>]+)>/g) { my $body = $1; $body =~ /^(img|video)\b/i or next; # $1 $2 $3 $body =~ /\bsrc\s*=\s*(['"]([^"']+)|([^ \t\r\n>]+))/ or next; my $target = $2 // $3; # <img src="foo.png"> results in $target="foo.png" # Skip anything external http: etc with a : # Skip anything with an & as probably literal text which the rough # parse has misinterpreted next if $target =~ /[:&]/; # If $target is relative then resolve against $dirname of the html. my $target_fullname = normalize_pkg_path($dirname, $target); if (! target_exists($info, $proc, $group, $target_fullname)) { tag 'html-missing-image-file', $file, $target; } } } return; } sub target_exists { my ($info, $proc, $group, $target_fullname) = @_; if ($info->index_resolved_path($target_fullname)) { return 1; } # Check our dependencies: my $ginfo = $group->info; my $deps = $ginfo->direct_dependencies($proc); foreach my $depproc (@{$ginfo->direct_dependencies($proc)}) { my $info = $depproc->info; my $f = $info->index_resolved_path($target_fullname); if ($f && $f->is_file) { return 1; } } return 0; } 1; # Local Variables: # indent-tabs-mode: nil # cperl-indent-level: 4 # End: # vim: syntax=perl sw=4 sts=4 sr et
Check-Script: html Type: binary Needs-Info: unpacked, file-info Info: This script checks HTML file content. Tag: html-missing-image-file Severity: normal Certainty: possible Info: HTML file missing an <img> file. Generally a HTML file in a package should have its image files packaged too, and in the right place. . If an image is only some candy then missing it doesn't matter very much, but the aim would still be to have the packaged page look good. If an image is something important like a technical diagram then missing it might make the HTML almost useless. . If a logo or similar is not freely redistributable then it will be deliberately omitted. Lintian can't distinguish that from mistaken omission. . If some HTML is a template then its links might not exist yet. Lintian can't distinguish that from links that ought to have been filled in by a configure etc. The suggestion would be to ignore reports on templates or add lintian overrides. . Beware absolute paths like src="/foo.png". This is common in HTML written for a web site but fails when copied elsewhere like a Debian package. Relative links are more helpful so that a document is displayable from under a different mount point etc. . Images supplied by a dependent package might give false positives. Packages from the same source should work if checked as a group.
-- System Information: Debian Release: 8.0 APT prefers unstable APT policy: (990, 'unstable') Architecture: i386 (i686) Kernel: Linux 3.16.0-4-686-pae (SMP w/1 CPU core) Locale: LANG=en_AU.utf8, LC_CTYPE=en_AU.utf8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: sysvinit (via /sbin/init) Versions of packages lintian depends on: ii binutils 2.25-4 ii bzip2 1.0.6-7+b2 ii diffstat 1.58-1 ii file 1:5.22+15-1 ii gettext 0.19.3-2 ii hardening-includes 2.7 ii intltool-debian 0.35.0+20060710.1 ii libapt-pkg-perl 0.1.29+b2 ii libarchive-zip-perl 1.39-1 ii libclass-accessor-perl 0.34-1 ii libclone-perl 0.37-1+b1 ii libdpkg-perl 1.17.23 ii libemail-valid-perl 1.195-1 ii libfile-basedir-perl 0.03-1 ii libipc-run-perl 0.92-1 ii liblist-moreutils-perl 0.33-2+b1 ii libparse-debianchangelog-perl 1.2.0-1.1 ii libtext-levenshtein-perl 0.11-1 ii libtimedate-perl 2.3000-2 ii liburi-perl 1.64-1 ii man-db 2.7.0.2-5 ii patchutils 0.3.3-1 ii perl [libdigest-sha-perl] 5.20.1-5 ii t1utils 1.38-3 Versions of packages lintian recommends: ii libperlio-gzip-perl 0.18-3+b1 ii perl 5.20.1-5 ii perl-modules [libautodie-perl] 5.20.1-5 Versions of packages lintian suggests: pn binutils-multiarch <none> ii dpkg-dev 1.17.23 ii libhtml-parser-perl 3.71-1+b3 ii libtext-template-perl 1.46-1 ii libyaml-perl 1.13-1 ii xz-utils 5.1.1alpha+20120614-2+b3 -- no debconf information -- debsums errors found: debsums: changed file /usr/share/lintian/profiles/debian/main.profile (from lintian package)