On Fri, Jan 07, 2011 at 02:38:49PM +0100, Andreas Delmelle wrote:
> On 07 Jan 2011, at 14:17, Simon Pepping wrote:
> 
> Hi Simon,
> 
> > On Fri, Jan 07, 2011 at 07:31:07AM -0500, [email protected] wrote:
> >> So, if no one objects, I will apply the patch as proposed. FOP will no 
> >> longer
> >> crash, but simply show a '#' for such unassigned codepoints in the output.
> >> Treating them as regular alphabetic characters seems to be safe enough for 
> >> the
> >> time being.
> > 
> > Would it not be better to use character FFFD, 'Replacement Character',
> > �, for this?
> 
> Interesting. In the context of linebreaking, that comes down to basically the 
> same thing.
> 
> U+FFFD has linebreak class 'AI' or 'Ambiguous', which is currently also 
> converted to 'Alphabetic' as part of the initial conversions.
> 
> Are you suggesting that we substitute the codepoint in the actual text 
> content (rather than leave it there, and further rely on the default 
> treatment of 'missing glyphs')?

I had not yet thought so far. I reflected on the use of '#' as the
replacement character for missing glyphs. Is that not particular to
FOP, and should we not conform to Unicode and use the Unicode
replacement character in such situations?

Really replacing the character in the text would go very far. A
missing glyph is usually dependent on the chosen font, while the
character itself is quite valid. In this case, however, the character
itself is invalid, in the sense that the code point has not been
assigned to a character in Unicode. (The bug report calls 1F7E a Greek
extended character, but the Unicode chart for Greek
extended characters, http://www.unicode.org/charts/PDF/U1F00.pdf,
shows no character assignment for this code point.) That means that it
does not even have properties, such as a linebreaking class. Using
class 'Ambiguous' seems the right solution for that problem.

Simon

Reply via email to