lynx is fairly good at HTML parsing, however there are a few areas where it has, issues, mostly because the syntax has mutated a good bit over time.
The case that has bitten me enough times to make me want to fix it
fairly badly, and which is even more obvious with color styles, is that
of '<tag some_attributes />', which means fairly exactly
'<tag some_attributes></tag>'.
Some valid examples of this are '<a name="chapter1" />', and much worse
for lynx '<script type="text/javascript" src="dhtml.js" />', the latter
is especially bad because lynx simply won't render anything past it.
(And it in the header is absolute death for rendering the page.)
So, with that in mind, the attached patch causes lynx to parse
'<tag attributes />' as '<tag attributes></tag>', this properly renders
the above cases, and I have not found a test that it breaks on.
It still parses '<tag/foo/>' the same way as well.
Reviews, flames, and grumbles are all welcome.
Zephaniah E. Hull.
--
1024D/E65A7801 Zephaniah E. Hull <[EMAIL PROTECTED]>
92ED 94E4 B1E6 3624 226D 5727 4453 008B E65A 7801
CCs of replies from mailing lists are requested.
Ken Thompson claims that he started developing UNIX so he could play
Space War, but the end product shows he was really much more interested
in cheating at Scrabble.
-- Seen in the SDM.
diff -ur lynx-2.8.6/WWW/Library/Implementation/SGML.c mine/lynx-2.8.6/WWW/Library/Implementation/SGML.c
--- lynx-2.8.6/WWW/Library/Implementation/SGML.c 2006-01-22 20:16:14.000000000 -0500
+++ mine/lynx-2.8.6/WWW/Library/Implementation/SGML.c 2007-04-15 17:03:31.000000000 -0400
@@ -200,6 +200,7 @@
BOOL first_bracket;
BOOL second_bracket;
BOOL isHex;
+ BOOL end_slash;
HTParentAnchor *node_anchor;
LYUCcharset *inUCI; /* pointer to anchor UCInfo */
@@ -3472,12 +3473,28 @@
case S_tag_gap: /* Expecting attribute or '>' */
if (WHITE(c))
break; /* Gap between attributes */
+ if (c == '/') {
+ context->end_slash = TRUE;
+ break;
+ }
if (c == '>') { /* End of tag */
#ifdef USE_PRETTYSRC
if (!psrc_view)
#endif
- if (context->current_tag->name)
+ if (context->current_tag->name) {
start_element(context);
+ if (context->end_slash) {
+ if (context->recover == NULL) {
+ StrAllocCopy(context->recover, "</");
+ context->recover_index = 0;
+ } else {
+ StrAllocCat(context->recover, "</");
+ }
+ StrAllocCat(context->recover, context->current_tag->name);
+ StrAllocCat(context->recover, ">");
+ context->end_slash = FALSE;
+ }
+ }
#ifdef USE_PRETTYSRC
if (psrc_view) {
PSRCSTART(abracket);
@@ -3485,9 +3502,11 @@
PSRCSTOP(abracket);
}
#endif
+ context->end_slash = FALSE;
context->state = S_text;
break;
}
+ context->end_slash = FALSE;
HTChunkPutc(string, c);
context->state = S_attr; /* Get attribute */
break;
signature.asc
Description: Digital signature
_______________________________________________ Lynx-dev mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/lynx-dev
