According to Thomas Netousek:
> I am running htdig-3.2.0-0.b3.4 from the RedHat linux 7.1 distribution and
> I am indexing documents
> which have  all types of funny characters like e.g. single quotes spelled
> as ’
> 
> I have seen other reports about the parser failing for &amp, so I am
> wondering if this could
> be sort of a similar problem ?
> 
> Btw, I am also running htdig-3.1.5 on another machine with translate_...
> set to true and it works
> like a charm there.

I believe 3.2.0b3 will not translate numeric entities where the number
is larger than 255.  3.1.5 does, but it's a bug, because it only used 8
bit characters internally, so it only keeps the bottom 8 bits of this
number.  Because in 3.2.0b3 the numeric entity isn't converted, the
"&" goes into the excerpt literally, and so it's turned into an &
entity on output so it should display literally as "&", so you will see
the numeric entity.  Given the 8-bit character set limitations in both
3.1 and 3.2, I thing that 3.2's behaviour is the lesser of two evils
when it comes to handling numeric entities above 255.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to