Manuel Mall wrote:
I have a question on this. You break in TextArea the text into words based on CharUtilities.isAnySpace. Is this guaranteed to be consistent with the breaking and adjustment calculations in TextLayoutManager? I am concerned we may be using different rules for word breaking in different places.
As far as consistency is concerned, I agree with you: the handling of the different kinds of spaces (breaking, non-breaking, fixed width, ...) is still quite incomplete and "dispersed" over different classes. Just to add another example, the CharacterLM implicitly "expects" its character to be a non-space character and has its own lines of code concerning the creation of the elements, while it could share the methods already called by the TextLM.
Having a single, centralized class taking care of the breaking (be it a Java utility class or a Fop one) and a single, shared method implementing the creation of the elements would surely increase consistency and clarity.
Somehow it doesn't feel right to me that TextLayoutManager does all the breaking and calculations and then we give the whole chunk to TextArea and it breaks it again using a possibly different algorithm but still using the adjustment value calculated by TextLayoutManager.
When I was trying to fix bug 36238 I initially started modifying TextLM#createTextArea(), using the AreaInfo objects to create WordAreas and SpaceAreas, but I then decided to move the "string splitting" inside TextArea because:
1) if WordAreas and SpaceAreas are not directly created by the LMs, there is no need to change a single line of code inside the classes creating TextAreas; this is not a real "reason" supporting the choice, just an handy consequence of it;
2) if TextArea still provides a getText() method, the renderers are not forced to render the text word by word and space by space if their word spacing treatment is not affected by multi-byte characters; but once again, this is not a real reason as we could provide this method anyway;
3) although both SpaceArea and WordArea hava an "offset" attribute it is ATM not used, so these areas does not carry any formatting information; their only purpose is to "highlight" spaces, thus allowing some specific renderer to handle them correctly regardless of their encoding; in other words, we are not losing braking and calculations, we simply do not need them anymore as we already know exactly which text will be placed in each line, and how wide it will be once it's correctly adjusted;
4) the text that will be placed in a line cannot be directly taken from "textArray" (in the TextLM), and the string "str" should be used instead anyway, as it may be different from the concatenation of the single pieces of text; at the moment the only difference concerns the hyphenation character "-" added at the end of the line, but I suspect that in different languages there could be other differences; so, we cannot simply create a WordAreas for each AreaInfo object.
So, if you find it strange to break the text, put it together and split it again, me too! :-) But this initial feeling disappeared when I realized that the final splitting does not involve "breaking" in its proper sense, but just "classification" of characters.
This is why I did what I did; if I did not manage to convince you ... you can try and convince me! :-)