Hi all.

I have been trying to automate the retrieval of data from a WWWsite
which happens to have a malformed comment in the HTML <head>
section. It looks like this:

<HTML>
<HEAD>
<! Created on 15/10/95 Amended by CH for Leicestershire etc
22/08/2001 ->

<TITLE>Library Catalogue</TITLE>
<META HTTP-EQUIV="REFRESH" content="2; URL=/www-bin/www_talis2">

</HEAD>

<BODY>
    :
    :

The comment declaration is correct, but the comment itself is
non-standard in that it is missing the leading and trailing pairs
of hyphens. Opera, Navigator and IE all handle this without
trouble, but HeadParser is blind to the <meta> information. It
produces the following debug trace:

START[html]
TEXT[
]
START[head]
TEXT[
]
TEXT[<! Created on 15/10/95 Amended by CH for Leicestershire etc
22/08/2001 ->

]
START[title]

which looks like it's starting OK as it has at least relegated
the comment to a category of 'text', and goes on to find the
starting tag for the title. Strangely though, it fails to find
anything after <title>,and skips the body text of the title
as well as the subsequent <meta> tag.

I am a little weak when it comes to DynaLoader and C
extensions, and I wondered if anybody had any thoughts
about this? I can achieve the desired result by setting

$ua->parse_head(0);

and then editing out the rogue comment and post-processing
explicitly with HeadParser, but it's not a solution that I like.

Thanks,

Rob



Reply via email to