Re: Standalone html parser

Hrvoje Niksic Fri, 29 Jun 2001 00:38:32 -0700
Anees Shaikh <[EMAIL PROTECTED]> writes:

> I'm trying to use the code in html-parse.c (v1.7) in standalone mode

Excellent!

> For some reason, <img src=... > tags are recognized but then skipped
> almost every time they are encountered.  When using the full program
> and recursive retrieve, the images are in fact retreived so it seems
> that the parser does work correctly when not in standalone mode.
> 
> It seems that the following condition is met when parsing img
> tag attributes
> 
>       /* Establish bounds of attribute name. */
>       attr_name_begin = p;    /* <foo bar ...> */
>                               /*      ^        */
>       while (NAME_CHAR_P (*p))
>         ADVANCE (p);
>       attr_name_end = p;      /* <foo bar ...> */
>                               /*         ^     */
>       if (attr_name_begin == attr_name_end)
>         goto backout_tag;
> 
> Can someone shed some light on this?

For some reason, the parser does not advance past the attribute name.
Try going into the debugger and printing the value of P.  You should
find out why the parser refuses to advance beyond attr_name_begin.

Perhaps it thinks it has reached the end of file?  (Are you calling it
with the proper text length?)  Perhaps the text is corrupted due to
another bug in your program and the attribute name is invalid?  A
number of things could be wrong.

When I wrote the parser, I primarily tested it in "standalone" mode,
so it should work thus.
Re: Standalone html parser

Reply via email to