On 07 Jan 2011, at 14:17, Simon Pepping wrote:
> On Fri, Jan 07, 2011 at 07:31:07AM -0500, bugzi...@apache.org wrote:
>> So, if no one objects, I will apply the patch as proposed. FOP will no longer
>> crash, but simply show a '#' for such unassigned codepoints in the output.
>> Treating them as regular alphabetic characters seems to be safe enough for
>> time being.
> Would it not be better to use character FFFD, 'Replacement Character',
> �, for this?
Interesting. In the context of linebreaking, that comes down to basically the
U+FFFD has linebreak class 'AI' or 'Ambiguous', which is currently also
converted to 'Alphabetic' as part of the initial conversions.
Are you suggesting that we substitute the codepoint in the actual text content
(rather than leave it there, and further rely on the default treatment of