Author: vhennebert
Date: Wed Oct 22 04:49:30 2008
New Revision: 707044
URL: http://svn.apache.org/viewvc?rev=707044&view=rev
Log:
Completed section about breaking tables over pages
Modified:
xmlgraphics/fop/branches/Temp_Interleaved_Page_Line_Breaking/prototype/doc/src/docbook/prototype.xml
Modified:
xmlgraphics/fop/branches/Temp_Interleaved_Page_Line_Breaking/prototype/doc/src/docbook/prototype.xml
URL:
http://svn.apache.org/viewvc/xmlgraphics/fop/branches/Temp_Interleaved_Page_Line_Breaking/prototype/doc/src/docbook/prototype.xml?rev=707044&r1=707043&r2=707044&view=diff
==============================================================================
---
xmlgraphics/fop/branches/Temp_Interleaved_Page_Line_Breaking/prototype/doc/src/docbook/prototype.xml
(original)
+++
xmlgraphics/fop/branches/Temp_Interleaved_Page_Line_Breaking/prototype/doc/src/docbook/prototype.xml
Wed Oct 22 04:49:30 2008
@@ -71,7 +71,7 @@
<titleabbrev>Same paragraph on slightly wider page.</titleabbrev>
<literallayout class="monospaced">In olden times when wishing still
helped one, there lived a king
-whose daughters were all beautiful,
+whose daughters were all beautiful,
but the youngest was so beautiful
that the sun itself, which has seen
so much, was astonished whenever
@@ -106,7 +106,7 @@
<itemizedlist>
<listitem>
<para>how to deal with paragraphs? We need to know the width(s) of
the page(s) they will
- by typesetted on before breaking them. That means that the list of
boxes and penalties
+ be typeset on before breaking them. That means that the list of
boxes and penalties
representing the lines like we saw above cannot be determined
before we start page
breaking. But if itâs not available, how to determine if the
bottom of a page will be
reached while we are inside a paragraph, and if we must consider
to continue it on the
@@ -180,16 +180,10 @@
<section>
<title>Beyond the Box/Glue/Penalty Model?</title>
<para>As said above the box/glue/penalty model shows its limits when it
comes to implementing
- tables. Another representation may be necessary to deal with the
certain degree of
- desynchronization that can happen between columns. Even if we consider
that the lists of
- elements for the columnsâ contents are available, there is no
accurate way to merge them
- into one list: either the resulting elements will fail to represent
all the flexibility
- offered by the column layouts; or the demerits wonât match the sum
of the individual column
- demerits. Moreover, itâs not possible to represent the column
desynchronization by using
- only one list of merged elements (see the <ulink
-
url="http://wiki.apache.org/xmlgraphics-fop/TableLayout/KnownProblems/">wiki
page</ulink>
- about known issues in the current approach). Several lists must be
computed depending on the
- breaks chosen for the preceding pages, which is cumbersome.</para>
+ tables. Still, this model has proved to be quite flexible and
powerful. Letâs have a look at
+ how it behaves in tables.</para>
+ <section>
+ <title>The Spring Paradigm</title>
<para>Glues can be seen as springs that can be both compressed and
stretched (in the general
case). Only, instead of having a <ulink
url="http://en.wikipedia.org/wiki/Spring_(device)">spring
constant</ulink>
@@ -200,7 +194,7 @@
was designed in a way that small displacements would be considered as
almost equivalent to
the natural length, while big displacements would quickly become
unacceptable; <xref
linkend="fig_demerits"/> shows an example where the natural length
is 10, and the content
- can be shrinked down to 8 and stretched up to 14.</para>
+ can be shrunk down to 8 and stretched up to 14.</para>
<figure id="fig_demerits">
<title>Demerits, functions of the size of the part (line, page,
etc.)</title>
<titleabbrev>Demerits function</titleabbrev>
@@ -210,13 +204,13 @@
</imageobject>
</mediaobject>
</figure>
- <para>However, the spring image still applies; we just have to imagine
that it will be even
- more difficult than usual to compress/extend the spring to its
maximum. Then we can see
- pages or tables as frames to which we attach springs representing the
body or the columns.
- In the case of a table we can imagine to start with the first column,
and successively
- attach a spring for every column. Depending on what the column
contains we will have to
- compress or extend the spring, which will have an effect on the other
columns as well as the
- possible text preceding the table on the page (see <xref
linkend="fig_springs"/>).</para>
+ <para>The spring image still applies though; we just have to imagine
that it will be even more
+ difficult than usual to compress/extend the spring to its maximum.
Then we can see pages or
+ tables as frames to which we attach springs representing the region
body or the columns. In
+ the case of a table we can imagine to start with the first column, and
successively attach a
+ spring for every column. Depending on what the column contains we will
have to compress or
+ extend the spring, which will have an effect on the other columns as
well as the possible
+ text preceding the table on the page (see <xref
linkend="fig_springs"/>).</para>
<figure id="fig_springs">
<title>Seeing content as springs</title>
<mediaobject>
@@ -227,17 +221,20 @@
</figure>
<para>By using this representation it is relatively easy to âfeelâ
what happens: the springs
will adjust such that the total force applied on them and the frame
will be minimal. From an
- analytical point of vue the adjustment ratio for every spring will be
computed such that the
- sum of the demerits will be minimal. This is not easy with the
demerits function as it is
- used currently, but it is possible to choose another function that
will behave similarly to
- Knuthâs original one, yet will simplify the computation of the
minimum.</para>
+ analytical point of view the adjustment ratio for every spring will be
computed such that
+ the sum of the demerits will be minimal. This is not easy with the
demerits function as it
+ is used currently, but it is possible to choose another function that
will behave similarly
+ to Knuthâs original one, yet will simplify the computation of the
minimum.</para>
+ </section>
+ <section>
+ <title>Challenges</title>
<para>The problem is that there may be many ways to break every column,
corresponding to more
or less content on the first page. In other words, there may be many
different springs to
play with for each column, and trying every combination of them
obviously leads to
- combinatorial explosion. Some combinations are obviously non-sensical
and should not even be
+ combinatorial explosion. Some combinations are obviously nonsensical
and should not even be
tested; for example, when there is only one line of content in the
first column, three lines
in the second and itâs obvious that the first column may also be
filled with three lines
- (see <xref linkend="fig_table_silly-combination"/>)</para>
+ (see <xref linkend="fig_table_silly-combination"/>).</para>
<figure id="fig_table_silly-combination">
<title>An obviously wrong combination of table columns</title>
<mediaobject>
@@ -270,17 +267,28 @@
</imageobject>
</mediaobject>
</figure>
- <para>All that said, and back to the title of this section, maybe that
the box/glue/penalty
- model simply needs to be dropped, and that another model can be found
that will be able to
- represent tables in an elegant way and allow to solve the problem
easily. Needless to say
- that such a model has yet to be discoveredâ¦</para>
+ <para>Another problem is to create a list of elements that will
represent all the columns. In
+ other words, replace a set of springs with a single one that will
behave the same way.
+ Thatâs probably not possible: either the resulting spring will fail
to represent all the
+ flexibility offered by the individual springs; or the demerits wonât
match the sum of the
+ individual column demerits. On top of that there is the
desynchronization that can happen
+ inside a row (see the <ulink
+
url="http://wiki.apache.org/xmlgraphics-fop/TableLayout/KnownProblems/">wiki
page</ulink>
+ about known issues in the current approach). Several lists will
probably have to be computed
+ depending on the breaks chosen for the preceding pages.</para>
+ <para>So maybe that the box/glue/penalty model simply needs to be
dropped, and that another
+ model can be found that will be able to represent tables in an elegant
way and allow to
+ solve the problem easily. Needless to say that such a model has yet to
be discoveredâ¦</para>
+ </section>
</section>
<section>
- <title>Breaking Paragraphs Before Merging Columns</title>
+ <title>Breaking Paragraphs and Merging Columns</title>
+ <section>
+ <title>Multiple Layouts for a Paragraph</title>
<para>With the new approach it is possible to take into account several
ways of typesetting a
paragraph, differing by the number of lines that it occupies. For
example, the optimal way
will be five lines, but there are also a four-line possibility and a
six-line possibility.
- In the four-line version spaces are likely to be shrinked a lot, while
on the six-line they
+ In the four-line version spaces are likely to be shrunk a lot, while
on the six-line they
will be much stretched. While those possibilities look less good than
the five-line version,
they may be preferred in the final layout because they would allow to
avoid a widow or
orphan line somewhere else, leading to less global demerits; or even
lead to a feasible
@@ -301,7 +309,7 @@
an unfinished page (there is not enough elasticity to stretch the
content up to the end of
the page); maybe the six-line version makes the content perfectly fit
in the page (no
shrinking or stretching). Then it would be chosen over the five-line
version, because the
- absence of stretching at the page level would compensate for the
sligthly sub-optimal layout
+ absence of stretching at the page level would compensate for the
slightly sub-optimal layout
of that paragraph.</para>
<figure id="fig_paragraph_different-numbers-of-lines">
<title>Three different paragraph layouts to consider</title>
@@ -323,12 +331,152 @@
5 4
1.5+0â1
1
- ââââââââ ââââââââââ
+ ââââââââ ââââââââââ
Total: 11+1â1 11.5+1â2</literallayout>
<para>The two possibilities are overlapping, and would both fit in a
page with a height of 12;
how to order them? By the total optimum value? By the minimum value?
This may be a problem
for the merging algorithm.</para>
- <para><emphasis>To be continuedâ¦</emphasis></para>
+ <para>Maybe thereâs a possibility to find a merging process for which
this wouldnât be a
+ problem, but a simpler way to go is to simply disable this feature
when we are inside
+ tables, and always work with the optimal paragraph layout. Along with
restriction 1 of
+ Knuthâs algorithm (the value <inlineequation><mathphrase>total
length â total
+ shrink</mathphrase></inlineequation> increases when iterating over
the elements) this
+ gives a total ordering of the layouts. Then itâs probably possible
to re-use the merging
+ method currently used in FOP.</para>
+ </section>
+ <section>
+ <title>Adapting the Current Merging Algorithm</title>
+ <para>As a short and simplified reminder, the merging algorithm works
the following way:
+ <orderedlist>
+ <listitem>
+ <para>Compute the smallest step for every column (the length of
the content before the
+ first legal break).</para>
+ </listitem>
+ <listitem>
+ <para>Select the maximum of those steps (it has been agreed that
every column should
+ contain some content before a break).</para>
+ </listitem>
+ <listitem>
+ <para>In the other columns, progress along the legal breaks to get
as near as this
+ maximum step without overtaking it. That gives the first legal
break in the
+ row.</para>
+ </listitem>
+ <listitem>
+ <para>From there on, go to the smallest next legal break.</para>
+ </listitem>
+ <listitem>
+ <para>Select every column that has such a break; they make part of
the next legal break
+ inside the table.</para>
+ </listitem>
+ <listitem>
+ <para>Go on like this, until the end of the row is
reachedâ¦</para>
+ </listitem>
+ </orderedlist>
+ </para>
+ <para>Again, this approach works because the elements for the paragraphs
are known in advance.
+ Plus it doesnât take elastic spaces into account; while this might
look like a reasonable
+ trade-off for tables, the same algorithm is also applied to lists
where it becomes an
+ annoying limitation.</para>
+ <para>When introducing elastic spaces, the legal breaks are no longer at
a given length but
+ may cover an interval. Then a layout from one column may be compatible
with several layouts
+ from another column. The difference will be in the amount of content
after the break, which
+ means that the row may end at different distances from the top of the
following page (<xref
+ linkend="fig_table_distances-from-top"/>). To respect the dynamic
programming approach,
+ all of the layouts must be kept, because the layout that will be part
of the optimal
+ solution may not be the optimal one at the table level (the optimal
layout, in the
+ <emphasis>table context</emphasis>, may not be part of the optimal
solution, in the
+ <emphasis>whole document context</emphasis>).</para>
+ <figure id="fig_table_distances-from-top">
+ <title>Combining one layout on the first column with two different
layouts on the second
+ one, making the table end at different positions</title>
+ <titleabbrev>Multiple column combinations leading to multiple table
layouts</titleabbrev>
+ <mediaobject>
+ <imageobject>
+ <imagedata fileref="table_distances-from-top"/>
+ </imageobject>
+ </mediaobject>
+ </figure>
+ <para>So two different sets will be available: layouts that will be said
+ <emphasis>compatible</emphasis>; that is, itâs possible to play with
their elastic spaces so
+ that all of the columns reach the bottom of the page before the break;
and the other ones,
+ for which at least one column is <quote>underfull</quote>: either it
doesnât provide
+ stretchability and its height is inferior to the height of the row, or
it doesnât have
+ enough of stretchability to reach the bottom of the page. Compatible
layouts are perfect;
+ incompatible ones are ok, provided that this is not a situation like
in <xref
+ linkend="fig_table_silly-combination"/>, where a better alternative
is available. Letâs
+ call those latter <emphasis>unacceptable</emphasis>. The whole
challenge is to make the
+ difference between layouts that are incompatible but acceptable, and
those that are both
+ incompatible <emphasis>and</emphasis> unacceptable. Open question: if
compatible layouts are
+ available, should incompatible ones even be considered?</para>
+ <para>Since we want to keep only the best layout for a given paragraph,
the column combination
+ process canât be started before the ends of the column contents are
reached. Indeed, when we
+ are in the middle of the paragraph, some of the currently active
layouts may be specific to
+ a non-optimal version of the paragraph. If they are selected as parts
of a certain column
+ combination, and then removed because they arenât part of the
optimal paragraph layout, that
+ will invalidate the corresponding column combination.</para>
+ <para>So we would wait that the paragraphs are typeset before merging
the columns. In fact,
+ the column contents would be typeset separately up to the end, and
then the merging process
+ would be started. The process would be similar to the usual
page-breaking mechanism: every
+ column would be considered as a page from the point of view of the
column content. Only, in
+ the general case the first <quote>page</quote> would not have a set
height, because of the
+ content preceding the table that may have elastic spaces (see the
springs from <xref
+ linkend="fig_springs"/>). This is not really a problem, though,
simply a generalization of
+ the test: check that two intervals intersect instead of that a point
belongs to an interval
+ (actually a point may be seen as a singleton interval). Note that the
point of view taken
+ here is the opposite of the usual one: instead of testing if a layout
fits in a page, i.e.,
+ that it can be shrunk or stretched enough so that its actual dimension
matches the page
+ height, we check that the page height belongs to the interval [min,
max] formed by the
+ layout. In the end, this is the same test. In the generalized case the
<quote>page</quote>
+ height would be an interval instead of a single point.</para>
+ <para>Another modification is that underfull layouts must be considered:
indeed, a column is
+ allowed to be underfull if thereâs another one that fills the page.
The only thing to pay
+ attention to is to not build a column combination out of underfull
layouts only: at least
+ one of them must be complete. Underfull layouts will therefore have to
be properly
+ flagged.</para>
+ <para>A potential issue with that approach is that too much work may be
performed. In general
+ several layouts will be available for every column, and itâs
possible that some layouts will
+ not be parts of any final combination. For example, if paragraphs in
the first column have
+ an orphan setting of 1, and 2 in all other columns. See <xref
+ linkend="fig_table-unused-column-layout"/>: using the first possible
layout for column 1
+ will lead to an unacceptable combination.</para>
+ <figure id="fig_table-unused-column-layout">
+ <title>One of the layouts in the first column will never be used in
any final
+ combination</title>
+ <titleabbrev>Unused column layouts</titleabbrev>
+ <mediaobject>
+ <imageobject>
+ <imagedata fileref="table_unused-column-layout"/>
+ </imageobject>
+ </mediaobject>
+ </figure>
+ <para>So it may be better to start the combining process as soon as
possible, but as seen
+ above paragraphs must be typeset first. Or maybe not? The problem was
to define an ordering
+ on the layouts. When a page break can occur in the middle of a
paragraph, is it really a
+ problem if that paragraph is not finished yet? As long as the previous
paragraphs have their
+ optimal layouts, the total order should be available. All that can
happen inside the
+ paragraph is that there may be a few lines more or less on the page;
that means a few more
+ boxes added to the set of layouts thatâs available at the beginning
of the paragraph. If
+ that set was totally ordered, it should remain so when adding the
boxes corresponding to the
+ lines of the current paragraph. That said, itâs not totally clear
what happens when not all
+ the lines have the same height. Letâs imagine that a paragraph
contains two inline images
+ whose heights are 1.5Â times the normal line height. Letâs assume
thereâs one layout in which
+ both images are on the same line; and another one where spaces are
slightly more shrunk so
+ that the first image can be placed on the preceding line (<xref
+ linkend="fig_paragraph-with-images"/>). The first layout will be
4.5Â units high while the
+ second one will be 5Â units high. Maybe that wonât be a problem,
butâ¦</para>
+ <figure id="fig_paragraph-with-images">
+ <title>Layouts of a paragraph containing two inline images</title>
+ <mediaobject>
+ <imageobject>
+ <imagedata fileref="paragraph-with-images"/>
+ </imageobject>
+ </mediaobject>
+ </figure>
+ <para>Another thing to keep in mind that may complicate things is that
not all columns may end
+ on the same page; some with fewer content may end on earlier pages.
They must be considered
+ in the combination, but they wonât participate in the computation of
the resulting
+ min-opt-max.</para>
+ </section>
</section>
</section>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]