The bug results from the behaviour of transSGML() in HTML.cc which
is not really suitable for use with URLs.
1) transSGML will not correctly translate "&" into "&" (as re-
quired by HTML 4.0 draft standard) if the configuration directive
"translate_amp" is not set, i.e. the URL parameter "?i=1&p=1"
will not be translated into "?i=1&p=1".
2) transSGML will corrupt any URL that uses the traditional URL
parameter delimiter by the attempt to translate a non-existing
entity which results in a space character, e.g. the URL para-
meter "?i=1&p=1" will be truncated to "?i=1".
Following is a quick fix for this problem. It affects the behaviour
of following functions (and those which use them):
- SGMLEntities::translate()
Will return an ampersand instead of a space for unrecognized en-
tities, thus leaving single ampersand characters "as is" (which
will affect document text as well!).
- SGMLEntities::translateAndUpdate()
Will restore the text pointer to the character after the ampersand
for unrecognized entities.
- HTML::transSGML()
Will translate any "&" entity regardless of the settings of
"translate_amp".
As stated above, this is only a quick fix, which might not work for
all cases (but it works for me so far). ,-)
cheers,
Torsten
*** HTML.cc~ Sun Sep 26 18:05:07 1999
--- HTML.cc Sun Sep 26 18:43:53 1999
***************
*** 1113,1122 ****
convert = 0;
while (*text)
{
! if (*text == '&')
! convert << SGMLEntities::translateAndUpdate(text);
! else
! convert << *text++;
}
return convert.get();
}
--- 1113,1127 ----
convert = 0;
while (*text)
{
! if (*text == '&')
! {
! convert << SGMLEntities::translateAndUpdate(text);
! if( !strncmp(text,"amp;",4) )
! text += 4;
! }
! else
! convert << *text++;
}
return convert.get();
}
*** SGMLEntities.cc~ Sun Sep 26 18:30:53 1999
--- SGMLEntities.cc Sun Sep 26 18:39:28 1999
***************
*** 165,171 ****
}
else
{
! return ' '; // Unrecognized entity. Change it into a
space...
}
}
--- 165,171 ----
}
else
{
! return '&'; // Unrecognized entity. Return just an
ampersand...
}
}
***************
*** 280,284 ****
if (*entityStart == ';')
entityStart++; // A final ';' is used up.
! return translate(entity);
}
--- 280,287 ----
if (*entityStart == ';')
entityStart++; // A final ';' is used up.
! unsigned char e = translate(entity);
! if( e == '&' && !translate_amp )
! entityStart = orig + 1; // Catch unrecognized entities...
! return e;
}
--
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstra�e 14 Tel: +49-4101-403605
D-25474 Ellerbek Fax: +49-4101-403606
E-Mail: [EMAIL PROTECTED] Internet: http://www.inwise.de
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.