I am putting some xml notes together on design aspects of area tree 
construction.  Attached is some xml with the notes so far.  Comments 

<?xml version="1.0" encoding="ISO-8859-1"?>
<!-- $Id: xml-parsing.xml,v 1.6 2002-02-15 11:25:24+10 pbw Exp pbw $ -->
<!DOCTYPE document SYSTEM "../../xml-docs/dtd/document-v10.dtd">

    <title>Implementing co-routines</title>
      <person name="Peter B. West" email="[EMAIL PROTECTED]"/>
    <!-- one of (anchor s1) -->
    <s1 title="Implementing Co-routines in FOP">
	All general page layout systems have to solve the same
	fundamental problem: expressing a flow of text with its own
	natural structure as a series of pages corresponding to the
	physical and logical structure of the output medium.  This
	simple description disguises many complexities.  Version 1.0
	of the Recommendation, in Section 3, <em>Introduction to
	Formatting </em>, includes the following comments.
	[Formatting] comprises several steps, some of which depend on
	others in a non-sequential way.<br/> ...and...<br/>
	[R]efinement is not necessarily a straightforward, sequential
	procedure, but may involve look-ahead, back-tracking, or
	control-splicing with other processes in the formatter.
      <p>Section 3.1, <em>Conceptual Procedure</em>, includes:</p>
	The procedure works by processing formatting objects. Each
	object, while being processed, may initiate processing in
	other objects. While the objects are hierarchically
	structured, the processing is not; processing of a given
	object is rather like a co-routine which may pass control to
	other processes, but pick up again later where it left off.
	If one looks only at the flow side of the equation, it's
	difficult to see what the problem might be.  The ordering of
	the elements of the flow is preserved in the area tree, and
	where elements are in an hierarchical relationship in the
	flow, they will generally be in an hierarchical relationship
	in the area tree.  In such circumstances, the recursive
	processing of the flow seems quite natural.
	The problem becomes more obvious when one thinks about the
	imposition of an unrelated page structure over the
	hierarchical structure of the document content.  Take, e.g.,
	the processing of a nested flow structure which, at a certain
	point, is scanning text and generating line-areas, nested
	within other block areas and possibly other line areas.  The
	page fills in the middle of this process.  Processing at the
	lowest level in the tree must now suspend, immediately
	following the production of the line-area which filled the
	page.  This same event, however, must also trigger the closing
	and flushing to the area tree of every open area of which the last
	line-area was a descendent.
	Once all of these areas have been closed, some dormant process
	or processes must wake up, flush the area sub-tree
	representing the page, and open a new page sub-tree in the
	area tree.  Then the whole nested structure of flow objects
	and area production must be re-activated, at the point in
	processing at which the areas of the previous page were
	finalised, but with the new page environment.  The most
	natural way of expressing the temporal relationship of these
	processes is by means of co-routines.
	Normal sub-routines (methods) display a hierarchical
	relationship where process A suspends on invoking process B,
	which on termination returns control to A which resumes from
	the point of suspension. Co-routines instead have a parallel
	relationship.  Process A suspends on invoking process B, but
	process B also suspends on returning control to process A.  To
	process B, this return of control appears to be an invocation
	of process A.  When process A subsequently invokes B and
	suspends, B behaves as though its previous invocation of A has
	returned, and it resumes from the point of that invocation.
	So control bounces between the two, each one resuming where it
	left off.
	For example, think of a page-production method working on a
	complex page-sequence-master.
	void makePages(...) {
	  while (pageSequence.hasNext()) {
	    page = generateNextPage(...);
	    boolean over = flow.fillPage(page);
	    if (over) return;
	The <code>fillPage()</code> method, when it fills a page, will
	have unfinished business with the flow, which it will want to
	resume at the next call; hence co-routines.  One way to
	implement them in Java is by threads synchronised on some
	common argument-passing object.
	Jeffrey H. Kingston, in <em>The Design and Implementation of
	the Lout Document Formatting Language</em> describes the
	<em>galley</em> abstraction which he implemented in
	<em>Lout</em>.  A document to be formatted is a stream of text
	and symbols, some of which are <strong>receptive
	symbols</strong>.  The output file is the first receptive
	symbol; the formatting document is the first galley.  The
	archetypical example of a receptive symbol is
	<strong>@FootPlace</strong> and its corresponding galley
	definition, <strong>@FootNote</strong>.
	Each galley should be thought of as a concurrent process, and
	each is associated with a semaphore (or synchronisation
	object.)  Galleys are free to "promote" components into
	receptive targets as long as a) an appropriate target has been
	encountered in the file, b) the component being promoted
	contains no unresolved galley targets itself, and c) there is
	sufficient room for the galley component at the target.  If
	these conditions are not met, the galley blocks on its
	semaphore.  When conditions change so that further progress
	may be possible, the semaphore is signalled.  Note that the
	galleys are a hierarchy, and that the processing and promotion
	of galley contents happens <em>bottom-up</em>.
	This structure can be transposed into XSLFO terms by
	substituting <em>areas</em> for <em>targets</em> and certain
	<em>flow objects</em> from fo:flows and fo:static-contents in
	place of <em>galleys</em>.  The picture is more complex in
	XSLFO because of the nature of the area tree.  Flows
	cannot be as easily restarted after page completions because
	the whole of the nested area tree heirarchy has to be
	rebuilt.  Compensating for this is the relative simplicity of
	regenerating many of the "nominal" area tree containers like
	viewports, regions and the pure containers.  Traits will, for
	the most part, remain unchanged in the re-generation.
	However, because there can be no guarantee that general page
	layout will be unchanged, various sizing, positioning and
	possibly other layout-related traits will change.  Whether any
	changes would be forced in the properties on the fo tree
	remains to be seen.  In any case, the necessary adjustments
	could be made after the restart from the synchronisation
	In general, the galley processes will inherit, and themselves
	propagate, available space information.  However, in many
	circumstances, the galleys must perform as much processing as
	possible with limited area size information.  Such
	circumstances include "automatic" table layout and layout with
	side floats.  Line-areas have a block progression dimension
	which is determined by their contents. To achieve full
	generality in such layouts, the contents of line-areas should
	be laid out as though their inline progression dimension was
	limited only by their content.  In the process, all possible
	break-points can be determined.  Where a line-area contains
	mixed fonts or embedded images, the bpd of the individual
	line-areas which are eventually stacked will, in general,
	depend on the line break points, but the advantage of this
	approach is that such actual selections can be backed out and
	new break points selected with a minimum of recalculation.
	This can potentially occur whenever a first attempt at page
	layout is backed out.

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Reply via email to