Hi Mike,

On 3/9/2009 at 2:34 PM, Michael McCandless wrote:
> See changes at http://lucene.apache.org/java/2_4_1/changes/Changes.html

Minor nit: the encoding of Christian Kohlschütter's name in the 2.4.1 section 
of CHANGES.txt appears to be Latin-1, but changes2html.pl assumes that 
CHANGES.txt is encoded as UTF-8, so the resulting Changes.html has an 
improperly encoded "ü" (lowercase "u" with an umlaut):

    14. LUCENE-1186: Add Analyzer.close() to free internal ThreadLocal
    resources. 
    (Christian Kohlsch�tter via Mike McCandless)

For me, both in the web browser and in the excerpt from it that I've pasted 
above, instead of a lowercase "u" with an umlaut, I see a small white question 
mark on a black diamond background, indicating an invalid UTF-8 byte sequence: 
byte 0xFC, marking the beginning of a multi-byte sequence, but then no trailing 
bytes with the high bit set.

Anyway, I think the fix is simple: edit CHANGES.txt so that "Kohlschütter" is 
properly encoded as UTF-8, as the remainder of the file is, then regenerate 
Changes.html.

Steve

Reply via email to