Re: Knuth linebreaking questions

Simon Pepping Wed, 01 Dec 2004 11:21:48 -0800

On Tue, Nov 30, 2004 at 07:27:29PM +0100, Luca Furini wrote:
> Finn Bock wrote:
> 
> > 3) What is the reasoning for doing hyphenation only after threshold=1
> > fails. Naive common sense tells me that if the user specify hyphenation
> > we should do hyphenation before finding line breaks.
> 
> Finding hyphenation points is time-expansive (all words must be
> hyphenated, not only the ones "near a line's end"), the sequence of
> elements becomes longer, there are more feasible breaking points, and a
> line ending with a "-" is less beautiful; so I thought that if a set of
> breaking points could be find without hyphenation.
> 
> I just took the "hyphenate" property as a suggestion instead of an order! :-)


This is the practice in TeX too. It may be considered as a
satisfactory implementation of hyphenate="true": Take hyphenation into
account, when your line layout algorithm considers it a better
solution to hyphenate these lines. This algorithm does not think it
necessary to try hyphenation when there is a non-hyphenated solution
with an amount of demerits below a certain threshold.

Note that in TeX such thresholds are user-adjustable parameters. I
think they should eventually be so in FOP too, for those of us who
have the most exquisite taste of line layout.
 
> Note that the same algorithm with the same threshold could find a
> different set of breaking points with and without hyphenation, because the
> elements are different. Without hyphenation, spaces could need a little
> higher adjustment, for example.
> 
> > 4) I've compared your code to tex_wrap
> >     http://oedipus.sourceforge.net/texlib/
> > and the main difference is in the way new KnuthNodes are added to the
> > active list. Is the BestRecords part of Knuth or is it your own
> > invention? Why is it only fitness_class'es in BestRecord that is higher
> > then minDemerits + incompatibleFitnessDemerit that is added to
> > activeList? Why not all fitness_class'es in BestRecords?
> 
> At the moment I don't have the book at hand, but I am quite sure it's
> *not* an invention of mine! :-)
> 
> As far as I can remember, the Knuth book uses 4 different variables, named
> C1, ... C4 :-( (or maybe D or A, anyway not a very self-documenting name!)
> and I just created this structure to store them.
 
The algorithm distinguishes four classes of lines: tight, normal,
loose, very loose. When two consecutive lines are not of the same or
of two adjacent classes, it gives a penalty of
incompatibleFitnessDemerit. If the line of class i leading to
breakpoint b does not have an amount of demerits best.getDemerits(i)
which is less than the minimum demerits of all four classes (there is
one best line of each class leading to breakpoint b),
best.getMinDemerits(), plus incompatibleFitnessDemerit, it can never
be selected. The optimization omits it from the list of best
breakpoints. Knuth mentions that it saves him 25% of executions of his
loop, in his computational experiments.

Regards, Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl

Re: Knuth linebreaking questions

Reply via email to