[ 
https://issues.apache.org/jira/browse/FOP-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas L. Delmelle updated FOP-2466:
-------------------------------------
    Description: 
When processing a FO file that contains pre-hyphenated text, using 
soft-hyphens, FOP's hyphenation does not yield usable results.

>From the corresponding thread on fop-users@:

---
The accumulated sequence of characters since the previous break opportunity is 
taken to be a 'word', which may or may not end in a hyphen. If the latter is 
true, a specific sequence of elements is glued to the word-box, to prevent a 
break before SHY and make sure that it is properly rendered, i.e. only counts 
if the break occurs right after.

As hyphenation by FOP itself is applied at a higher level, when all layout 
elements for a whole paragraph have been collected, that SHY sequence is seen 
as a word boundary. That is, that part of the algorithm just accumulates the 
text for ‘uninterrupted' sequences of word-boxes, and feeds those pieces to the 
hyphenator. The real intention is to apply hyphenation across any nested 
fo:inlines. ‘Uninterrupted’ means that auxiliary elements, generated for border 
or padding are explicitly *not* considered as word boundaries. The sequence 
generated for SHY contains two non-auxiliary elements, as if it were a space. 
Perhaps, just to ensure that that position in the layout always leads to a 
character that is visibly rendered.

In case of pre-hyphenated text, this has the unintended effect of restricting 
the input for the hyphenator to parts of words, which is basically meaningless 
(and wasteful).
---

Amongst others, this leads to the "hyphenation-ladder-count" property having 
seemingly no effect.

Note - At this point, I believe the behaviour is not necessarily incorrect. I 
am also thinking that it would be correct to ignore hyphenation-ladder-count in 
case hyphenation="false".

Initial idea for a fix: 
Make sure that the SHY sequence is not treated as a word boundary in LineLM 
when accumulating text for boxes generated by the TextLMs. Once done, we should 
then be able to check for each hyphenation point that FOP itself calculates, 
whether there is already an explicit SHY present at that same point. In that 
case, we can just do nothing (= leave the SHY in place).

  was:
When processing a FO file that contains pre-hyphenated text, using 
soft-hyphens, FOP's hyphenation does not yield usable results.

>From the corresponding thread on fop-users@:

... internally for FOP, [t]he accumulated sequence of characters since the 
previous break opportunity is taken to be a 'word', which may or may not end in 
a hyphen. If the latter is true, a specific sequence of elements is glued to 
the word-box, to prevent a break before SHY and make sure that it is properly 
rendered, i.e. only counts if the break occurs right after.

As hyphenation by FOP itself is applied at a higher level, when all layout 
elements for a whole paragraph have been collected, that SHY sequence is seen 
as a word boundary. That is, that part of the algorithm just accumulates the 
text for ‘uninterrupted' sequences of word-boxes, and feeds those pieces to the 
hyphenator. The real intention is to apply hyphenation across any nested 
fo:inlines. ‘Uninterrupted’ means that auxiliary elements, generated for border 
or padding are explicitly *not* considered as word boundaries. The sequence 
generated for SHY contains two non-auxiliary elements, as if it were a space. 
Perhaps, just to ensure that that position in the layout always leads to a 
character that is visibly rendered.

In case of pre-hyphenated text, this has the unintended effect of restricting 
the input for the hyphenator to parts of words, which is basically meaningless 
(and wasteful).

Amongst others, this leads to the "hyphenation-ladder-count" property having 
seemingly no effect.

Note - At this point, I believe the behaviour is not necessarily incorrect. I 
am also thinking that it would be correct to ignore hyphenation-ladder-count in 
case hyphenation="false".

Initial idea for a fix: 
Make sure that the SHY sequence is not treated as a word boundary in LineLM 
when accumulating text for boxes generated by the TextLMs. Once done, we should 
then be able to check for each hyphenation point that FOP itself calculates, 
whether there is already an explicit SHY present at that same point. In that 
case, we can just do nothing (= leave the SHY in place).


> Improve output for pre-hyphenated text with SHY combined with hyphenation 
> properties
> ------------------------------------------------------------------------------------
>
>                 Key: FOP-2466
>                 URL: https://issues.apache.org/jira/browse/FOP-2466
>             Project: Fop
>          Issue Type: Improvement
>          Components: layout/line
>    Affects Versions: 1.1
>            Reporter: Andreas L. Delmelle
>            Priority: Minor
>              Labels: hyphenation, soft-hyphen
>
> When processing a FO file that contains pre-hyphenated text, using 
> soft-hyphens, FOP's hyphenation does not yield usable results.
> From the corresponding thread on fop-users@:
> ---
> The accumulated sequence of characters since the previous break opportunity 
> is taken to be a 'word', which may or may not end in a hyphen. If the latter 
> is true, a specific sequence of elements is glued to the word-box, to prevent 
> a break before SHY and make sure that it is properly rendered, i.e. only 
> counts if the break occurs right after.
> As hyphenation by FOP itself is applied at a higher level, when all layout 
> elements for a whole paragraph have been collected, that SHY sequence is seen 
> as a word boundary. That is, that part of the algorithm just accumulates the 
> text for ‘uninterrupted' sequences of word-boxes, and feeds those pieces to 
> the hyphenator. The real intention is to apply hyphenation across any nested 
> fo:inlines. ‘Uninterrupted’ means that auxiliary elements, generated for 
> border or padding are explicitly *not* considered as word boundaries. The 
> sequence generated for SHY contains two non-auxiliary elements, as if it were 
> a space. Perhaps, just to ensure that that position in the layout always 
> leads to a character that is visibly rendered.
> In case of pre-hyphenated text, this has the unintended effect of restricting 
> the input for the hyphenator to parts of words, which is basically 
> meaningless (and wasteful).
> ---
> Amongst others, this leads to the "hyphenation-ladder-count" property having 
> seemingly no effect.
> Note - At this point, I believe the behaviour is not necessarily incorrect. I 
> am also thinking that it would be correct to ignore hyphenation-ladder-count 
> in case hyphenation="false".
> Initial idea for a fix: 
> Make sure that the SHY sequence is not treated as a word boundary in LineLM 
> when accumulating text for boxes generated by the TextLMs. Once done, we 
> should then be able to check for each hyphenation point that FOP itself 
> calculates, whether there is already an explicit SHY present at that same 
> point. In that case, we can just do nothing (= leave the SHY in place).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to