Author: vhennebert
Date: Wed Oct 22 04:49:30 2008
New Revision: 707044

URL: http://svn.apache.org/viewvc?rev=707044&view=rev
Log:
Completed section about breaking tables over pages

Modified:
    
xmlgraphics/fop/branches/Temp_Interleaved_Page_Line_Breaking/prototype/doc/src/docbook/prototype.xml

Modified: 
xmlgraphics/fop/branches/Temp_Interleaved_Page_Line_Breaking/prototype/doc/src/docbook/prototype.xml
URL: 
http://svn.apache.org/viewvc/xmlgraphics/fop/branches/Temp_Interleaved_Page_Line_Breaking/prototype/doc/src/docbook/prototype.xml?rev=707044&r1=707043&r2=707044&view=diff
==============================================================================
--- 
xmlgraphics/fop/branches/Temp_Interleaved_Page_Line_Breaking/prototype/doc/src/docbook/prototype.xml
 (original)
+++ 
xmlgraphics/fop/branches/Temp_Interleaved_Page_Line_Breaking/prototype/doc/src/docbook/prototype.xml
 Wed Oct 22 04:49:30 2008
@@ -71,7 +71,7 @@
           <titleabbrev>Same paragraph on slightly wider page.</titleabbrev>
 <literallayout class="monospaced">In olden times when  wishing  still
 helped  one,  there  lived  a  king
-whose daughters were all  beautiful,
+whose daughters were all beautiful,
 but the youngest was  so  beautiful
 that the sun itself, which has seen
 so much,  was  astonished  whenever
@@ -106,7 +106,7 @@
       <itemizedlist>
         <listitem>
           <para>how to deal with paragraphs? We need to know the width(s) of 
the page(s) they will 
-            by typesetted on before breaking them. That means that the list of 
boxes and penalties 
+            be typeset on before breaking them. That means that the list of 
boxes and penalties 
             representing the lines like we saw above cannot be determined 
before we start page 
             breaking. But if it’s not available, how to determine if the 
bottom of a page will be 
             reached while we are inside a paragraph, and if we must consider 
to continue it on the 
@@ -180,16 +180,10 @@
     <section>
       <title>Beyond the Box/Glue/Penalty Model?</title>
       <para>As said above the box/glue/penalty model shows its limits when it 
comes to implementing 
-        tables. Another representation may be necessary to deal with the 
certain degree of 
-        desynchronization that can happen between columns. Even if we consider 
that the lists of 
-        elements for the columns’ contents are available, there is no 
accurate way to merge them 
-        into one list: either the resulting elements will fail to represent 
all the flexibility 
-        offered by the column layouts; or the demerits won’t match the sum 
of the individual column 
-        demerits. Moreover, it’s not possible to represent the column 
desynchronization by using 
-        only one list of merged elements (see the <ulink 
-          
url="http://wiki.apache.org/xmlgraphics-fop/TableLayout/KnownProblems/";>wiki 
page</ulink> 
-        about known issues in the current approach). Several lists must be 
computed depending on the 
-        breaks chosen for the preceding pages, which is cumbersome.</para>
+        tables. Still, this model has proved to be quite flexible and 
powerful. Let’s have a look at 
+        how it behaves in tables.</para>
+      <section>
+        <title>The Spring Paradigm</title>
       <para>Glues can be seen as springs that can be both compressed and 
stretched (in the general 
         case). Only, instead of having a <ulink 
           url="http://en.wikipedia.org/wiki/Spring_(device)">spring 
constant</ulink> 
@@ -200,7 +194,7 @@
         was designed in a way that small displacements would be considered as 
almost equivalent to 
         the natural length, while big displacements would quickly become 
unacceptable; <xref 
           linkend="fig_demerits"/> shows an example where the natural length 
is 10, and the content 
-        can be shrinked down to 8 and stretched up to 14.</para>
+        can be shrunk down to 8 and stretched up to 14.</para>
       <figure id="fig_demerits">
         <title>Demerits, functions of the size of the part (line, page, 
etc.)</title>
         <titleabbrev>Demerits function</titleabbrev>
@@ -210,13 +204,13 @@
           </imageobject>
         </mediaobject>
       </figure>
-      <para>However, the spring image still applies; we just have to imagine 
that it will be even 
-        more difficult than usual to compress/extend the spring to its 
maximum. Then we can see 
-        pages or tables as frames to which we attach springs representing the 
body or the columns. 
-        In the case of a table we can imagine to start with the first column, 
and successively 
-        attach a spring for every column. Depending on what the column 
contains we will have to 
-        compress or extend the spring, which will have an effect on the other 
columns as well as the 
-        possible text preceding the table on the page (see <xref 
linkend="fig_springs"/>).</para>
+      <para>The spring image still applies though; we just have to imagine 
that it will be even more 
+        difficult than usual to compress/extend the spring to its maximum. 
Then we can see pages or 
+        tables as frames to which we attach springs representing the region 
body or the columns. In 
+        the case of a table we can imagine to start with the first column, and 
successively attach a 
+        spring for every column. Depending on what the column contains we will 
have to compress or 
+        extend the spring, which will have an effect on the other columns as 
well as the possible 
+        text preceding the table on the page (see <xref 
linkend="fig_springs"/>).</para>
       <figure id="fig_springs">
         <title>Seeing content as springs</title>
         <mediaobject>
@@ -227,17 +221,20 @@
       </figure>
       <para>By using this representation it is relatively easy to ‘feel’ 
what happens: the springs 
         will adjust such that the total force applied on them and the frame 
will be minimal. From an 
-        analytical point of vue the adjustment ratio for every spring will be 
computed such that the 
-        sum of the demerits will be minimal. This is not easy with the 
demerits function as it is 
-        used currently, but it is possible to choose another function that 
will behave similarly to 
-        Knuth’s original one, yet will simplify the computation of the 
minimum.</para>
+        analytical point of view the adjustment ratio for every spring will be 
computed such that 
+        the sum of the demerits will be minimal. This is not easy with the 
demerits function as it 
+        is used currently, but it is possible to choose another function that 
will behave similarly 
+        to Knuth’s original one, yet will simplify the computation of the 
minimum.</para>
+      </section>
+      <section>
+        <title>Challenges</title>
       <para>The problem is that there may be many ways to break every column, 
corresponding to more 
         or less content on the first page. In other words, there may be many 
different springs to 
         play with for each column, and trying every combination of them 
obviously leads to 
-        combinatorial explosion. Some combinations are obviously non-sensical 
and should not even be 
+        combinatorial explosion. Some combinations are obviously nonsensical 
and should not even be 
         tested; for example, when there is only one line of content in the 
first column, three lines 
         in the second and it’s obvious that the first column may also be 
filled with three lines 
-        (see <xref linkend="fig_table_silly-combination"/>)</para>
+        (see <xref linkend="fig_table_silly-combination"/>).</para>
       <figure id="fig_table_silly-combination">
         <title>An obviously wrong combination of table columns</title>
         <mediaobject>
@@ -270,17 +267,28 @@
           </imageobject>
         </mediaobject>
       </figure>
-      <para>All that said, and back to the title of this section, maybe that 
the box/glue/penalty 
-        model simply needs to be dropped, and that another model can be found 
that will be able to 
-        represent tables in an elegant way and allow to solve the problem 
easily. Needless to say 
-        that such a model has yet to be discovered…</para>
+      <para>Another problem is to create a list of elements that will 
represent all the columns. In 
+        other words, replace a set of springs with a single one that will 
behave the same way. 
+        That’s probably not possible: either the resulting spring will fail 
to represent all the 
+        flexibility offered by the individual springs; or the demerits won’t 
match the sum of the 
+        individual column demerits. On top of that there is the 
desynchronization that can happen 
+        inside a row (see the <ulink 
+          
url="http://wiki.apache.org/xmlgraphics-fop/TableLayout/KnownProblems/";>wiki 
page</ulink> 
+        about known issues in the current approach). Several lists will 
probably have to be computed 
+        depending on the breaks chosen for the preceding pages.</para>
+      <para>So maybe that the box/glue/penalty model simply needs to be 
dropped, and that another 
+        model can be found that will be able to represent tables in an elegant 
way and allow to 
+        solve the problem easily. Needless to say that such a model has yet to 
be discovered…</para>
+      </section>
     </section>
     <section>
-      <title>Breaking Paragraphs Before Merging Columns</title>
+      <title>Breaking Paragraphs and Merging Columns</title>
+      <section>
+        <title>Multiple Layouts for a Paragraph</title>
       <para>With the new approach it is possible to take into account several 
ways of typesetting a 
         paragraph, differing by the number of lines that it occupies. For 
example, the optimal way 
         will be five lines, but there are also a four-line possibility and a 
six-line possibility.  
-        In the four-line version spaces are likely to be shrinked a lot, while 
on the six-line they 
+        In the four-line version spaces are likely to be shrunk a lot, while 
on the six-line they 
         will be much stretched. While those possibilities look less good than 
the five-line version, 
         they may be preferred in the final layout because they would allow to 
avoid a widow or 
         orphan line somewhere else, leading to less global demerits; or even 
lead to a feasible 
@@ -301,7 +309,7 @@
         an unfinished page (there is not enough elasticity to stretch the 
content up to the end of 
         the page); maybe the six-line version makes the content perfectly fit 
in the page (no 
         shrinking or stretching). Then it would be chosen over the five-line 
version, because the 
-        absence of stretching at the page level would compensate for the 
sligthly sub-optimal layout 
+        absence of stretching at the page level would compensate for the 
slightly sub-optimal layout 
         of that paragraph.</para>
       <figure id="fig_paragraph_different-numbers-of-lines">
         <title>Three different paragraph layouts to consider</title>
@@ -323,12 +331,152 @@
         5           4
         1.5+0−1
         1
-        ────────    ──────────
+       ────────    ──────────
 Total:  11+1−1      11.5+1−2</literallayout>
       <para>The two possibilities are overlapping, and would both fit in a 
page with a height of 12; 
         how to order them? By the total optimum value? By the minimum value? 
This may be a problem 
         for the merging algorithm.</para>
-      <para><emphasis>To be continued…</emphasis></para>
+      <para>Maybe there’s a possibility to find a merging process for which 
this wouldn’t be a 
+        problem, but a simpler way to go is to simply disable this feature 
when we are inside 
+        tables, and always work with the optimal paragraph layout. Along with 
restriction 1 of 
+        Knuth’s algorithm (the value <inlineequation><mathphrase>total 
length − total 
+            shrink</mathphrase></inlineequation> increases when iterating over 
the elements) this 
+        gives a total ordering of the layouts. Then it’s probably possible 
to re-use the merging 
+        method currently used in FOP.</para>
+      </section>
+      <section>
+        <title>Adapting the Current Merging Algorithm</title>
+      <para>As a short and simplified reminder, the merging algorithm works 
the following way:
+        <orderedlist>
+          <listitem>
+            <para>Compute the smallest step for every column (the length of 
the content before the 
+              first legal break).</para>
+          </listitem>
+          <listitem>
+            <para>Select the maximum of those steps (it has been agreed that 
every column should 
+              contain some content before a break).</para>
+          </listitem>
+          <listitem>
+            <para>In the other columns, progress along the legal breaks to get 
as near as this 
+              maximum step without overtaking it. That gives the first legal 
break in the 
+              row.</para>
+          </listitem>
+          <listitem>
+            <para>From there on, go to the smallest next legal break.</para>
+          </listitem>
+          <listitem>
+            <para>Select every column that has such a break; they make part of 
the next legal break 
+              inside the table.</para>
+            </listitem>
+          <listitem>
+            <para>Go on like this, until the end of the row is 
reached…</para>
+          </listitem>
+        </orderedlist>
+      </para>
+      <para>Again, this approach works because the elements for the paragraphs 
are known in advance. 
+        Plus it doesn’t take elastic spaces into account; while this might 
look like a reasonable 
+        trade-off for tables, the same algorithm is also applied to lists 
where it becomes an 
+        annoying limitation.</para>
+      <para>When introducing elastic spaces, the legal breaks are no longer at 
a given length but 
+        may cover an interval. Then a layout from one column may be compatible 
with several layouts 
+        from another column. The difference will be in the amount of content 
after the break, which 
+        means that the row may end at different distances from the top of the 
following page (<xref 
+          linkend="fig_table_distances-from-top"/>). To respect the dynamic 
programming approach, 
+        all of the layouts must be kept, because the layout that will be part 
of the optimal 
+        solution may not be the optimal one at the table level (the optimal 
layout, in the 
+        <emphasis>table context</emphasis>, may not be part of the optimal 
solution, in the 
+        <emphasis>whole document context</emphasis>).</para>
+      <figure id="fig_table_distances-from-top">
+        <title>Combining one layout on the first column with two different 
layouts on the second 
+          one, making the table end at different positions</title>
+        <titleabbrev>Multiple column combinations leading to multiple table 
layouts</titleabbrev>
+        <mediaobject>
+          <imageobject>
+            <imagedata fileref="table_distances-from-top"/>
+          </imageobject>
+        </mediaobject>
+      </figure>
+      <para>So two different sets will be available: layouts that will be said 
+        <emphasis>compatible</emphasis>; that is, it’s possible to play with 
their elastic spaces so 
+        that all of the columns reach the bottom of the page before the break; 
and the other ones, 
+        for which at least one column is <quote>underfull</quote>: either it 
doesn’t provide 
+        stretchability and its height is inferior to the height of the row, or 
it doesn’t have 
+        enough of stretchability to reach the bottom of the page. Compatible 
layouts are perfect; 
+        incompatible ones are ok, provided that this is not a situation like 
in <xref 
+          linkend="fig_table_silly-combination"/>, where a better alternative 
is available. Let’s 
+        call those latter <emphasis>unacceptable</emphasis>. The whole 
challenge is to make the 
+        difference between layouts that are incompatible but acceptable, and 
those that are both 
+        incompatible <emphasis>and</emphasis> unacceptable. Open question: if 
compatible layouts are 
+        available, should incompatible ones even be considered?</para>
+      <para>Since we want to keep only the best layout for a given paragraph, 
the column combination 
+        process can’t be started before the ends of the column contents are 
reached. Indeed, when we 
+        are in the middle of the paragraph, some of the currently active 
layouts may be specific to 
+        a non-optimal version of the paragraph. If they are selected as parts 
of a certain column 
+        combination, and then removed because they aren’t part of the 
optimal paragraph layout, that 
+        will invalidate the corresponding column combination.</para>
+      <para>So we would wait that the paragraphs are typeset before merging 
the columns. In fact, 
+        the column contents would be typeset separately up to the end, and 
then the merging process 
+        would be started. The process would be similar to the usual 
page-breaking mechanism: every 
+        column would be considered as a page from the point of view of the 
column content. Only, in 
+        the general case the first <quote>page</quote> would not have a set 
height, because of the 
+        content preceding the table that may have elastic spaces (see the 
springs from <xref 
+          linkend="fig_springs"/>). This is not really a problem, though, 
simply a generalization of  
+        the test: check that two intervals intersect instead of that a point 
belongs to an interval 
+        (actually a point may be seen as a singleton interval). Note that the 
point of view taken 
+        here is the opposite of the usual one: instead of testing if a layout 
fits in a page, i.e.,  
+        that it can be shrunk or stretched enough so that its actual dimension 
matches the page 
+        height, we check that the page height belongs to the interval [min, 
max] formed by the 
+        layout. In the end, this is the same test. In the generalized case the 
<quote>page</quote> 
+        height would be an interval instead of a single point.</para>
+      <para>Another modification is that underfull layouts must be considered: 
indeed, a column is 
+        allowed to be underfull if there’s another one that fills the page. 
The only thing to pay 
+        attention to is to not build a column combination out of underfull 
layouts only: at least 
+        one of them must be complete. Underfull layouts will therefore have to 
be properly 
+        flagged.</para>
+      <para>A potential issue with that approach is that too much work may be 
performed. In general 
+        several layouts will be available for every column, and it’s 
possible that some layouts will 
+        not be parts of any final combination. For example, if paragraphs in 
the first column have 
+        an orphan setting of 1, and 2 in all other columns. See <xref 
+          linkend="fig_table-unused-column-layout"/>: using the first possible 
layout for column 1 
+        will lead to an unacceptable combination.</para>
+      <figure id="fig_table-unused-column-layout">
+        <title>One of the layouts in the first column will never be used in 
any final 
+          combination</title>
+        <titleabbrev>Unused column layouts</titleabbrev>
+        <mediaobject>
+          <imageobject>
+            <imagedata fileref="table_unused-column-layout"/>
+          </imageobject>
+        </mediaobject>
+      </figure>
+      <para>So it may be better to start the combining process as soon as 
possible, but as seen 
+        above paragraphs must be typeset first. Or maybe not? The problem was 
to define an ordering 
+        on the layouts. When a page break can occur in the middle of a 
paragraph, is it really a 
+        problem if that paragraph is not finished yet? As long as the previous 
paragraphs have their 
+        optimal layouts, the total order should be available. All that can 
happen inside the 
+        paragraph is that there may be a few lines more or less on the page; 
that means a few more 
+        boxes added to the set of layouts that’s available at the beginning 
of the paragraph. If 
+        that set was totally ordered, it should remain so when adding the 
boxes corresponding to the 
+        lines of the current paragraph. That said, it’s not totally clear 
what happens when not all 
+        the lines have the same height. Let’s imagine that a paragraph 
contains two inline images 
+        whose heights are 1.5 times the normal line height. Let’s assume 
there’s one layout in which 
+        both images are on the same line; and another one where spaces are 
slightly more shrunk so 
+        that the first image can be placed on the preceding line (<xref 
+          linkend="fig_paragraph-with-images"/>). The first layout will be 
4.5 units high while the 
+        second one will be 5 units high. Maybe that won’t be a problem, 
but…</para>
+      <figure id="fig_paragraph-with-images">
+        <title>Layouts of a paragraph containing two inline images</title>
+        <mediaobject>
+          <imageobject>
+            <imagedata fileref="paragraph-with-images"/>
+          </imageobject>
+        </mediaobject>
+      </figure>
+      <para>Another thing to keep in mind that may complicate things is that 
not all columns may end 
+        on the same page; some with fewer content may end on earlier pages. 
They must be considered 
+        in the combination, but they won’t participate in the computation of 
the resulting 
+        min-opt-max.</para>
+      </section>
     </section>
   </section>
 



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to