Wow, I have to digest this first. I have a busy month behind me with not much of my brain allocated to FOP. But thanks so far for the feedback. What I can deduct from this is that my suspicion is probably correct that implementating hyphenation-keep will be quite tricky with the current code. I assume we have to do a few changes to make page- und line-breaking interact more closely (for "changing available IPD" etc.).
On 31.08.2006 17:57:14 Andreas L Delmelle wrote: > On Aug 31, 2006, at 17:04, Jeremias Maerki wrote: > > Hi Jeremias, > > > I'm investigating what would be necessary to implement hyphenation- > > keep. > > After some thought, I think this is one of those very mean properties > > that fire back from page-breaking back into line-breaking. IOW, > > when you > > detect a page/column break at a line which is hyphenated you'll > > basically have to track back and redo the line breaking, disabling > > that > > particular hyphenation possibility. You then have to redo the page > > breaking possibly having to backtrack again if another hyphenated line > > is again at the end of a column/page. Doesn't sound like a small > > change. > > As it happens, I've been looking in the same direction, although not > particularly the hyphenation-keep property. > > > The cheap way, of course, is to add penalty values to discourage page > > breaks between hyphenated lines (when hyphenation-keep is > > activated) but > > that could lead to ugly layout. It's certainly better to disable > > certain > > hyphenation points based on feedback from page breaking but it > > obviously > > means starting to backtrack into line breaking. Maybe the "changing > > available IPD" problem also plays into this. As we've seen, it may be > > necessary to redo certain line breaks based on events in page > > breaking. > > I've been doing some more browsing in the code and re-read your Wiki > page, and I'm getting more convinced that line-breaking should not be > made literally 'restartable' to deal with varying ipd between pages. > This does NOT mean that we don't need restartable line-breaking at > all, only that I think it's not the solution to that particular problem. > > In fact, in some cases --if the ipd-change occurs early in the page- > sequence-- restarting would be suboptimal, given that line-breaking > happens completely independent of page-breaking. Line-breaks for the > entire page-sequence, apart from the first few pages, will be > invalidated and have to be recreated... and possibly again, upon the > next page-break :/ > > The problem is that, if I interpret correctly, trimmed down to the > essence, the main loop now looks like this: > > generate first page > create list of line-breaks for the whole page-sequence > while (more line-breaks) > compute best page-break > if (more line-breaks) > generate next page > > Strictly speaking, this is total-fit line-breaking only for page- > sequences consisting of one page. As to the rest, it only offers > guarantees in as much as the page-width remains constant (the first > page's ipd). > > > > > Does anyone see a relatively simple way I have not yet seen? Or am I > > more or less on track? > > Depends on how we define simple, but it does address other areas as > well. > > What I had in mind as a first step, was to detach page-generation > from the page-breaking algorithm, such that the PageSequenceLM can > set both available bpd and ipd of a LayoutContext before passing it > to the FlowLM > Another way to look at it: page-breaking would actually become the > outer loop, driving the line-breaking to take place in pieces, but as > a first step, no more than that. > > The PageProvider already caches the pages, so the BreakingAlgorithm > would later have to iterate over them (whereas currently, they are > created on demand of the PageBreakingAlgorithm, so ipd changes aren't > even accessible when computing the line-breaks? Unless by having the > LineBreakingAlgorithm ask for the ipd a given page?) > > The most straightforward option would be to signal bp-overflow > through a flag in the context. Once the line-breaks for a paragraph > have been computed, the BlockLM updates the context: indicate bp- > overflow at node X (no detailed idea yet on how this is supposed to > look, but looking at the related code it doesn't seem too hard) > > After getNextKnuthElements() for each BlockLevelLM has been called, > the FlowLM can then check for the overflow flag, and if necessary, > hand the element-list up to that point over to the PageSequenceLM. If > I get the design correctly, it would then be up to the > PageBreakingAlgorithm to decide whether the list will be consumed > immediately --first-fit-- or whether following lists will be appended > before computing any effective page-breaks --total-fit. (This could > be made to depend on an extension property of the page-sequence?) > > Roughly the loop would come to look like: > > while (!flowLM.isFinished()) > generate next page > update context dimensions > while (no bp-overflow > && no forced page-break) > create next list of line-breaks > if (first-fit) > compute best page-break > add areas > else > append to global list > > For total-fit, the page-break computations can still be deferred and > performed after all the best line-breaks in the page-sequence are > known. The only difference being that the global list of line-breaks > will already be optimized to take into account ipd changes due to > varying page-masters. > > The thing I'm still struggling with is the necessary change for this > in the LayoutContext: > It seems that, to the line-breaking at least, this should either > a) actually contain a collection of contexts (?) or > b) be made aware of the bp-shifts implied by the line-breaks, so that > getRefIPD() would always return the 'current IPD' [= at the implied > bp-coordinate for a given node] > > > > > Another topic that we may have to address at some point is the > > distinction of keeps on column level and keeps on page level. So > > far, we > > can only map the keeps on column level. I wonder how we would go about > > an implementation here. It seems to me that the page breaker would > > have > > to start being more clever. > > > Anyway, the important thing for me right now is to have an idea how > > hyphenation-keep would have to be implemented so I can take an > > estimate > > and determine dependencies of tasks. > > Well, I already saw possible advantages in what I was investigating > for dealing with side- and end-floats. It would be possible, at the > time of computing the line-breaks for a float, to determine whether > it would by itself already cause an unavoidable bp-overflow (idem > dito for before-floats and footnotes: maybe a possible solution to > the open issue regarding footnotes and multi-column layout?) > > Maybe it could help here too, since info about the 'current' region- > body would be accessible to the LineBreakingAlgorithm? > > Anyway, I'm guessing that, the programming will become (a little) > more complex to follow, but if page-breaking and line-breaking can be > made to provide hints to each other, this would solve a lot of open > issues. > > Hope this gives you some clues. > I haven't made any changes myself yet, only did some information > gathering in the source code. > > > Cheers, > > Andreas Jeremias Maerki