Package: libxml2-utils Version: 2.7.8.dfsg-2+squeeze1 Severity: normal
Dear maintainer, When the html parser of libxml2 (with the recover option) meets a tag where the tag name starts with a full stop, it correctly detects that this is invalid HTML, but nevertheless accepts the tag with that name into the document tree. This means that if you output the same document tree as XML, you get an output that is malformed XML. Here's an example. $ xmllint --html --xmlout - <<<'<.m>r' -:1: HTML parser error : Tag .m invalid <.m>r ^ <?xml version="1.0" standalone="yes"?> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><body><.m>r </.m></body></html> $ The `<.m>' part is not well-formed XML, because XML element names cannot start with a full stop. You can see this if you try to parse the output with an XML parser, eg. with xmllint. In case you're interested, I have noticed this bug when I tried to parse some (invalid) HTML documents with the perl module XML::LibXML (which is using the libxml2 library as its backend) and output them as XML. -- System Information: Debian Release: 6.0.3 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 2.6.37 (SMP w/2 CPU cores) Locale: LANG=C, LC_CTYPE=hu_HU (charmap=ISO-8859-2) Shell: /bin/sh linked to /bin/bash Versions of packages libxml2-utils depends on: ii libc6 2.11.2-10 Embedded GNU C Library: Shared lib ii libreadline6 6.1-3 GNU readline and history libraries ii libxml2 2.7.8.dfsg-2+squeeze1 GNOME XML library libxml2-utils recommends no packages. libxml2-utils suggests no packages. -- no debconf information -- To UNSUBSCRIBE, email to [email protected] with a subject of "unsubscribe". Trouble? Contact [email protected]

