The bug results from the behaviour of transSGML() in HTML.cc which
is not really suitable for use with URLs.

1) transSGML will not correctly translate "&" into "&" (as re-
   quired by HTML 4.0 draft standard) if the configuration directive
   "translate_amp" is not set, i.e. the URL parameter "?i=1&p=1"
   will not be translated into "?i=1&p=1".

2) transSGML will corrupt any URL that uses the traditional URL
   parameter delimiter by the attempt to translate a non-existing
   entity which results in a space character, e.g. the URL para-
   meter "?i=1&p=1" will be truncated to "?i=1".

Following is a quick fix for this problem.  It affects the behaviour
of following functions (and those which use them):

- SGMLEntities::translate()
  Will return an ampersand instead of a space for unrecognized en-
  tities, thus leaving single ampersand characters "as is" (which
  will affect document text as well!).

- SGMLEntities::translateAndUpdate()
  Will restore the text pointer to the character after the ampersand
  for unrecognized entities.

- HTML::transSGML()
  Will translate any "&" entity regardless of the settings of
  "translate_amp".

As stated above, this is only a quick fix, which might not work for
all cases (but it works for me so far). ,-)


cheers,
  Torsten


*** HTML.cc~    Sun Sep 26 18:05:07 1999
--- HTML.cc     Sun Sep 26 18:43:53 1999
***************
*** 1113,1122 ****
      convert = 0;
      while (*text)
      {
!       if (*text == '&')
!           convert << SGMLEntities::translateAndUpdate(text);
!       else
!           convert << *text++;
      }
      return convert.get();
  }
--- 1113,1127 ----
      convert = 0;
      while (*text)
      {
!         if (*text == '&')
!         {
!             convert << SGMLEntities::translateAndUpdate(text);
!             if( !strncmp(text,"amp;",4) )
!                 text += 4;
!         }
!         else
!             convert << *text++;
      }
      return convert.get();
  }

*** SGMLEntities.cc~    Sun Sep 26 18:30:53 1999
--- SGMLEntities.cc     Sun Sep 26 18:39:28 1999
***************
*** 165,171 ****
      }
      else
      {
!       return ' ';     // Unrecognized entity.  Change it into a
space...
      }
  }
  
--- 165,171 ----
      }
      else
      {
!       return '&';     // Unrecognized entity.  Return just an
ampersand...
      }
  }
  
***************
*** 280,284 ****
      
      if (*entityStart == ';')
        entityStart++;          // A final ';' is used up.
!     return translate(entity);
  }
--- 280,287 ----
      
      if (*entityStart == ';')
        entityStart++;          // A final ';' is used up.
!     unsigned char e = translate(entity);
!     if( e == '&' && !translate_amp )
!         entityStart = orig + 1; // Catch unrecognized entities...
!     return e;
  }

-- 
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstra�e 14                            Tel: +49-4101-403605
D-25474 Ellerbek                            Fax: +49-4101-403606
E-Mail: [EMAIL PROTECTED]            Internet: http://www.inwise.de

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to