According to Neal Richter:
> I am seeing some HTML entities show up in search result 'blurbs'.
> 
> See below.  Basically any entity of this form &#XXX; get translated to &#XXX;
> 
> ™ --> ™
> 
> This only happens for numbered entities below 160.
> 
>   -->  
> © --> ©
> ® --> ®
> 
> I'm digging for this code.. looks like
> 
> Is there a fix for this in 3.1.X??  Anyone complain about this before????

No and yes.  Though 3.1.x does SGML decoding and re-encoding a bit
differently than 3.2, there's still a fundamental problem with both
versions that leads to this problem, which has come up again and again.

The problem is that until we have full Unicode support, we can't decode
all SGML entities and numbered entities into 8-bit characters.  So,
we convert the ones we're most likely to need within words, to allow
searches for accented characters and such, but we must leave some entities
still encoded in the database.  That leads us to the problem: we don't
know whether an ampersand in the database was originally decoded from
an entity (and thus should be reencoded), or if it was originally the
lead-in to an entity we didn't decode (and thus should not be encoded).

I suppose we could work out a kludge to encode the two cases differently
in the database, so we can distinguish between them on output, but
we'd need to find something that works, even if, say, a user adds '&'
to extra_word_characters.  Note also that '&' is part of the default
valid_punctuation value, so we really need to decode & to & for this
to work.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to