[Xmlgraphics-fop Wiki] Update of "KnuthsModel" by LucaFurini

Apache Wiki Fri, 30 Sep 2005 02:11:45 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Xmlgraphics-fop Wiki" 
for change notification.


The following page has been changed by LucaFurini:
http://wiki.apache.org/xmlgraphics-fop/KnuthsModel

The comment on the change is:
Initial description of Knuth's model

New page:
FOP's line breaking and page breaking algorithms both implement Knuth's 
breaking algorithm.

The detailed description of the model and the algorithm can be found on the 
paper "Breaking Paragraphs into Lines" by Donald E. Knuth, published in the 
book "Digital Typography" (Stanford, California: Center for the Study of 
Language and Information, 1999), (CSLI Lecture Notes, no. 78.), ISBN 
1-57586-010-4 (which is surely a book worth reading).

This is just a short summary meant to give a quick overview of the content 
representation that is the input of the breaking algorithm, such allowing 
interested people to understand FOP's behaviour even without the need to know 
how exactly the breaks are actually computed.

The best way to know how the algorithm examines this representation and find 
the breaks is reading Knuth's paper cited above and look at its implementation 
inside FOP, which follows quite closely the description.

It is important to note that this representation is not bound to a particular 
algorithm: it's very easy to use it as a input for a completely different 
breaking algorithm, such as a first-fit or best-fit one.

Knuth's original algorithm solves the line breaking problem, so this page will 
speak in terms of "paragraphs", "lines" and "words", but most of the concepts 
apply to the page breaking problem too: it's enough to substitute "paragraphs" 
with "page sequences", "lines" with "pages", and "words" with "lines". Just as 
the line breaking algorithm breaks the content of paragraphs into lines 
(containing words), the page breaking algorithm breaks the content of 
page-sequences into pages (containing lines).

= Formal representation of the problem =

A paragraph can be modelled as a sequence of elements

x,,1,, x,,2,, x,,3,, ... x,,n,,

where each x,,i,, can be either a '''box''' element, a '''glue''' element or a 
'''penalty''' element.

All elements have a '''width''' w; if x,,i,, is the i-th element in the 
sequence, its width will be w,,i,,.

== Box elements ==

A box element represents an unbreakable piece of content with fixed width: for 
example a character, a syllable (but only if letter spacing is constant), an 
inline image, ...

In the context of page breaking, a box element can represent a line.

== Glue elements ==

A glue element represents some kind of space between boxes: for example, a 
space character between two words.

A glue element x,,i,,, besides its optimal width w,,i,, has a 
'''stretchability''' y,,i,, and a '''shrinkability''' z,,i,,; this means that 
the breaking algorithm can set the element's actual width to any value in the 
range [w,,i,, - z,,i,,, w,,i,, + y,,i,,], although it will try to choose a 
value as close as possible to w,,i,,.

In the context of page breaking, a glue element can represent the space between 
two paragraphs.

== Penalty elements ==

A penalty element represents information about a breaking point; it does not 
represent any piece of content.

The width w,,i,, of a penalty element x,,i,, is considered only if the breaking 
algorithm chooses that penalty element as a break. For example, in the context 
of line breaking the width of the hyphen character "-" must be considered only 
if a word is actually hyphenated.

A penalty element x,,i,, has a '''penalty value''' p,,i,, representing the 
"aesthetic cost" of the break: a positive cost suggests the breaking algorithm 
not to use that break point, while a negative cost signals an appealing 
position for a break. In particular, if p,,i,, is +infinity it forbids a break, 
while if it is -infinity it forces a break (in other words, there cannot be a 
better place for a break).

Moreover, a penalty element has a '''flagged''' boolean value: this value marks 
penalties that should not be chosen to end consecutive lines. For example, in 
the context of line breaking the hyphenation points inside a word are 
represented using flagged penalties, so the algorithm tries to avoid the 
creation of consecutive lines ending with a hyphen.

== Starting and ending elements ==

... coming soon ....

== Feasible breaks ==

The goal of the breaking algorithm is to find an ordered set of indices

b,,0,, < b,,1,, < b,,2,, < ... < b,,k,,

such that
 * the first index, b,,0,,, is conventionally 0 (note that the first element in 
the sequence has index = 1)
 * the other indices point to an element that is a 'feasible break'

There are two simple rules stating whether an element is a feasible break or 
not: x,,i,, is a feasible break if either:

 1. x,,i,, is a glue element, and x,,i-1,, is a box; or
 1. x,,i,, is a penalty element and p,,i,, is not +infinity

The algorithm must respect the forced breaks: this means that this set must 
contain the indices of all the penalties whose value is -infinity; in 
particular b,,k,, = n.

== Element suppression ==

Having decided where the lines end, it's time to decide where they begin.

... coming soon ...

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[Xmlgraphics-fop Wiki] Update of "KnuthsModel" by LucaFurini

Reply via email to