According to Neal Richter: > I am seeing some HTML entities show up in search result 'blurbs'. > > See below. Basically any entity of this form &#XXX; get translated to &#XXX; > > ™ --> ™ > > This only happens for numbered entities below 160. > >   --> > © --> © > ® --> ® > > I'm digging for this code.. looks like > > Is there a fix for this in 3.1.X?? Anyone complain about this before????
No and yes. Though 3.1.x does SGML decoding and re-encoding a bit differently than 3.2, there's still a fundamental problem with both versions that leads to this problem, which has come up again and again. The problem is that until we have full Unicode support, we can't decode all SGML entities and numbered entities into 8-bit characters. So, we convert the ones we're most likely to need within words, to allow searches for accented characters and such, but we must leave some entities still encoded in the database. That leads us to the problem: we don't know whether an ampersand in the database was originally decoded from an entity (and thus should be reencoded), or if it was originally the lead-in to an entity we didn't decode (and thus should not be encoded). I suppose we could work out a kludge to encode the two cases differently in the database, so we can distinguish between them on output, but we'd need to find something that works, even if, say, a user adds '&' to extra_word_characters. Note also that '&' is part of the default valid_punctuation value, so we really need to decode & to & for this to work. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. Does SourceForge.net help you be more productive? Does it help you create better code? SHARE THE LOVE, and help us help YOU! Click Here: http://sourceforge.net/donate/ _______________________________________________ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev