FOPs,
I am putting some xml notes together on design aspects of area tree
construction. Attached is some xml with the notes so far. Comments
solicted.
Peter
<?xml version="1.0" encoding="ISO-8859-1"?>
<!-- $Id: xml-parsing.xml,v 1.6 2002-02-15 11:25:24+10 pbw Exp pbw $ -->
<!--
<!DOCTYPE document SYSTEM "../../xml-docs/dtd/document-v10.dtd">
-->
<document>
<header>
<title>Implementing co-routines</title>
<authors>
<person name="Peter B. West" email="[EMAIL PROTECTED]"/>
</authors>
</header>
<body>
<!-- one of (anchor s1) -->
<s1 title="Implementing Co-routines in FOP">
<p>
All general page layout systems have to solve the same
fundamental problem: expressing a flow of text with its own
natural structure as a series of pages corresponding to the
physical and logical structure of the output medium. This
simple description disguises many complexities. Version 1.0
of the Recommendation, in Section 3, <em>Introduction to
Formatting </em>, includes the following comments.
</p>
<note>
[Formatting] comprises several steps, some of which depend on
others in a non-sequential way.<br/> ...and...<br/>
[R]efinement is not necessarily a straightforward, sequential
procedure, but may involve look-ahead, back-tracking, or
control-splicing with other processes in the formatter.
</note>
<p>Section 3.1, <em>Conceptual Procedure</em>, includes:</p>
<note>
The procedure works by processing formatting objects. Each
object, while being processed, may initiate processing in
other objects. While the objects are hierarchically
structured, the processing is not; processing of a given
object is rather like a co-routine which may pass control to
other processes, but pick up again later where it left off.
</note>
<p>
If one looks only at the flow side of the equation, it's
difficult to see what the problem might be. The ordering of
the elements of the flow is preserved in the area tree, and
where elements are in an hierarchical relationship in the
flow, they will generally be in an hierarchical relationship
in the area tree. In such circumstances, the recursive
processing of the flow seems quite natural.
</p>
<p>
The problem becomes more obvious when one thinks about the
imposition of an unrelated page structure over the
hierarchical structure of the document content. Take, e.g.,
the processing of a nested flow structure which, at a certain
point, is scanning text and generating line-areas, nested
within other block areas and possibly other line areas. The
page fills in the middle of this process. Processing at the
lowest level in the tree must now suspend, immediately
following the production of the line-area which filled the
page. This same event, however, must also trigger the closing
and flushing to the area tree of every open area of which the last
line-area was a descendent.
</p>
<p>
Once all of these areas have been closed, some dormant process
or processes must wake up, flush the area sub-tree
representing the page, and open a new page sub-tree in the
area tree. Then the whole nested structure of flow objects
and area production must be re-activated, at the point in
processing at which the areas of the previous page were
finalised, but with the new page environment. The most
natural way of expressing the temporal relationship of these
processes is by means of co-routines.
</p>
<p>
Normal sub-routines (methods) display a hierarchical
relationship where process A suspends on invoking process B,
which on termination returns control to A which resumes from
the point of suspension. Co-routines instead have a parallel
relationship. Process A suspends on invoking process B, but
process B also suspends on returning control to process A. To
process B, this return of control appears to be an invocation
of process A. When process A subsequently invokes B and
suspends, B behaves as though its previous invocation of A has
returned, and it resumes from the point of that invocation.
So control bounces between the two, each one resuming where it
left off.
</p>
<p>
For example, think of a page-production method working on a
complex page-sequence-master.
</p>
<source>
void makePages(...) {
...
while (pageSequence.hasNext()) {
...
page = generateNextPage(...);
boolean over = flow.fillPage(page);
if (over) return;
}
}
</source>
<p>
The <code>fillPage()</code> method, when it fills a page, will
have unfinished business with the flow, which it will want to
resume at the next call; hence co-routines. One way to
implement them in Java is by threads synchronised on some
common argument-passing object.
</p>
<p>
Jeffrey H. Kingston, in <em>The Design and Implementation of
the Lout Document Formatting Language</em> describes the
<em>galley</em> abstraction which he implemented in
<em>Lout</em>. A document to be formatted is a stream of text
and symbols, some of which are <strong>receptive
symbols</strong>. The output file is the first receptive
symbol; the formatting document is the first galley. The
archetypical example of a receptive symbol is
<strong>@FootPlace</strong> and its corresponding galley
definition, <strong>@FootNote</strong>.
</p>
<p>
Each galley should be thought of as a concurrent process, and
each is associated with a semaphore (or synchronisation
object.) Galleys are free to "promote" components into
receptive targets as long as a) an appropriate target has been
encountered in the file, b) the component being promoted
contains no unresolved galley targets itself, and c) there is
sufficient room for the galley component at the target. If
these conditions are not met, the galley blocks on its
semaphore. When conditions change so that further progress
may be possible, the semaphore is signalled. Note that the
galleys are a hierarchy, and that the processing and promotion
of galley contents happens <em>bottom-up</em>.
</p>
<p>
This structure can be transposed into XSLFO terms by
substituting <em>areas</em> for <em>targets</em> and certain
<em>flow objects</em> from fo:flows and fo:static-contents in
place of <em>galleys</em>. The picture is more complex in
XSLFO because of the nature of the area tree. Flows
cannot be as easily restarted after page completions because
the whole of the nested area tree heirarchy has to be
rebuilt. Compensating for this is the relative simplicity of
regenerating many of the "nominal" area tree containers like
viewports, regions and the pure containers. Traits will, for
the most part, remain unchanged in the re-generation.
However, because there can be no guarantee that general page
layout will be unchanged, various sizing, positioning and
possibly other layout-related traits will change. Whether any
changes would be forced in the properties on the fo tree
remains to be seen. In any case, the necessary adjustments
could be made after the restart from the synchronisation
<string>wait</strong>.
</p>
<p>
In general, the galley processes will inherit, and themselves
propagate, available space information. However, in many
circumstances, the galleys must perform as much processing as
possible with limited area size information. Such
circumstances include "automatic" table layout and layout with
side floats. Line-areas have a block progression dimension
which is determined by their contents. To achieve full
generality in such layouts, the contents of line-areas should
be laid out as though their inline progression dimension was
limited only by their content. In the process, all possible
break-points can be determined. Where a line-area contains
mixed fonts or embedded images, the bpd of the individual
line-areas which are eventually stacked will, in general,
depend on the line break points, but the advantage of this
approach is that such actual selections can be backed out and
new break points selected with a minimum of recalculation.
This can potentially occur whenever a first attempt at page
layout is backed out.
</p>
</s1>
</body>
</document>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]