https://issues.apache.org/bugzilla/show_bug.cgi?id=50471

--- Comment #1 from Andreas L. Delmelle <adelme...@apache.org> 2011-01-05 
13:31:26 EST ---

Thanks for reporting, and apologies for the late reply...

At first glance, this seems like a minor oversight in the implementation of
Unicode linebreaking in FOP. This does not take into account the possibility
that a given codepoint is not assigned a 'class' in linebreaking context. (=
U+1F7E does not appear in the file
http://www.unicode.org/Public/UNIDATA/LineBreak.txt, which is used as a basis
to generate those arrays in LineBreakUtils.java)

On the other hand, one could obviously raise the question why you so
desperately need to have an unassigned codepoint in your output. Are you
absolutely sure you need this? If yes, then can you elaborate on the exact
reason? (i.e. What exactly is this unassigned codepoint used for?)

The most straightforward 'fix' seems to be roughly as follows:

Index: src/java/org/apache/fop/text/linebreak/LineBreakStatus.java
===================================================================
--- src/java/org/apache/fop/text/linebreak/LineBreakStatus.java    (revision
1054383)
+++ src/java/org/apache/fop/text/linebreak/LineBreakStatus.java    (working
copy)
@@ -87,6 +87,7 @@

         /* Initial conversions */
         switch (currentClass) {
+            case 0: // Unassigned codepoint: consider as AL?
             case LineBreakUtils.LINE_BREAK_PROPERTY_AI:
             case LineBreakUtils.LINE_BREAK_PROPERTY_SG:
             case LineBreakUtils.LINE_BREAK_PROPERTY_XX:

What this does, is assign the class 'AL' or 'Alphabetic' to any codepoint that
has not been assigned a class by Unicode. This means it will be treated as a
regular letter.
Now, the reason why I am asking the question whether you are sure you know what
you're doing, is that this may turn out to be undesirable. Perhaps the
character in question needs to be treated as a space rather than a letter.
Unicode does not define U+1F7E other than as a 'reserved' character, so it
makes sense that Unicode cannot say what should happen with this character in
the context of linebreaking...

That said, it is also wrong of FOP to crash in this case, so the bug is
definitely genuine.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to