HTML-Parser-3.29

Gisle Aas Thu, 14 Aug 2003 23:15:13 -0700

FYI:  I just upload v3.29 of the HTML-Parser to CPAN.  These are the
changes since the last release:


     Setting xml_mode now implies strict_names also for end tags.

     Avoid warning from Visual C.  Patch by <[EMAIL PROTECTED]>.

     64-bit fix from Doug Larrick <[EMAIL PROTECTED]>
     http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=195500

     Try to parse similar to Mozilla/MSIE in certain edge cases.
     All these are outside of the official definition of HTML but
     HTML spam often tries to take advantage of these.

       - New configuration attribute 'strict_end'.  Unless enabled
         we will allow end tags to contain extra words or stuff
         that look like attributes before the '>'.  This means that
         tags like these:

            </foo foo="<ignored>">
            </foo ignored>
            </foo ">" ignored>

         are now all parsed as a 'foo' end tag instead of text.
         Even if the extra stuff looks like attributes they will not
         be reported if requested via the 'attr' or 'tokens' argspecs
         for the 'end' handler.

       - Parse '</:comment>' and '</ comment>' as comments unless
         strict_comment is enabled.  Previous versions of the parser
         would report these as text.  If these comments contain
         quoted words prefixed by space or '=' these words can
         contain '>' without terminating the comment.
        
       - Parse '<! "<>" foo>' as comment containing ' "<>" foo'.
         Previous versions of the parser would terminate the comment
         at the first '>' and report the rest as text.

       - Legacy comment mode:  Parse with comments terminated with a
         lone '>' if no '-->' is found before eof.

       - Incomplete tag at eof is reported as a 'comment' instead
         of 'text' unless strict_comment is enabled.

Enjoy!

Regards,
Gisle

HTML-Parser-3.29

Reply via email to