Manuel Mall wrote:

Here are some of the combinations I have identified:

1. Non breaking / non elastic space => probably just a normal character, i.e. part of a word.

2. Non breaking / elastic space - eg. U+00A0 Non breaking space
        => Must prevent break
        => Must handle text-align

3. Break / non elastic - eg. U+200B ZWSP, any other break between two characters not involving adding or removing space/characters
        => Must handle border/padding
        => Must handle text-align

4. Break / non elastic / remove if not break - eg. U+00AD Soft hyphen
        => Must remove if not at break
        => Must handle border/padding
        => Must handle text-align

5. Break / non elastic / add character if break - eg. hyphenation
        => Must add space for hyphen if at break
        => Must handle border/padding
        => Must handle text-align

6. Breaking / elastic / non removable - eg. U+3000 Ideographic space
        => Must handle border/padding
        => Must handle text-align
Question: XSL-FO does not define U+3000 as removable white space but would under common CJK typesetting conventions this be removed at a line break?

7. Breaking / elastic / removable - eg. U+0020 Space
        => Can occur in runs which must be wholly removed
        => Must handle border/padding
        => Must handle text-align

Any combinations I have missed, e.g. is there a "break / non elastic / remove at break" case?

Maybe the fixed width spaces?

Anyway, it seems an exhaustive analysis of the problem!

Just a few comments / thoughts:

- non breaking, non elastic: the simple solution would be to handle these characters as normal "letters", so the text "before_after" (where _ is zwnbsp) would create a single AreaInfo object in the TextLM; but this would create problems during hyphenation, as non-letter characters in the middle of a word ATM prevents hyphenation

- soft hyphen: at the moment it is not properly handled, but it won't be difficult to fix the implementation; it could create the same elements used for an hyphenation point, but the penalty could have a negative value (as probably users would use it to "suggest" a desired line break); note that a word with a soft hyphen in its middle would not be hyphenated, unless we ignore this character when collecting word fragments

Regards
    Luca

Reply via email to