FOPs, I am putting some xml notes together on design aspects of area tree construction. Attached is some xml with the notes so far. Comments solicted.
Peter
<?xml version="1.0" encoding="ISO-8859-1"?> <!-- $Id: xml-parsing.xml,v 1.6 2002-02-15 11:25:24+10 pbw Exp pbw $ --> <!-- <!DOCTYPE document SYSTEM "../../xml-docs/dtd/document-v10.dtd"> --> <document> <header> <title>Implementing co-routines</title> <authors> <person name="Peter B. West" email="[EMAIL PROTECTED]"/> </authors> </header> <body> <!-- one of (anchor s1) --> <s1 title="Implementing Co-routines in FOP"> <p> All general page layout systems have to solve the same fundamental problem: expressing a flow of text with its own natural structure as a series of pages corresponding to the physical and logical structure of the output medium. This simple description disguises many complexities. Version 1.0 of the Recommendation, in Section 3, <em>Introduction to Formatting </em>, includes the following comments. </p> <note> [Formatting] comprises several steps, some of which depend on others in a non-sequential way.<br/> ...and...<br/> [R]efinement is not necessarily a straightforward, sequential procedure, but may involve look-ahead, back-tracking, or control-splicing with other processes in the formatter. </note> <p>Section 3.1, <em>Conceptual Procedure</em>, includes:</p> <note> The procedure works by processing formatting objects. Each object, while being processed, may initiate processing in other objects. While the objects are hierarchically structured, the processing is not; processing of a given object is rather like a co-routine which may pass control to other processes, but pick up again later where it left off. </note> <p> If one looks only at the flow side of the equation, it's difficult to see what the problem might be. The ordering of the elements of the flow is preserved in the area tree, and where elements are in an hierarchical relationship in the flow, they will generally be in an hierarchical relationship in the area tree. In such circumstances, the recursive processing of the flow seems quite natural. </p> <p> The problem becomes more obvious when one thinks about the imposition of an unrelated page structure over the hierarchical structure of the document content. Take, e.g., the processing of a nested flow structure which, at a certain point, is scanning text and generating line-areas, nested within other block areas and possibly other line areas. The page fills in the middle of this process. Processing at the lowest level in the tree must now suspend, immediately following the production of the line-area which filled the page. This same event, however, must also trigger the closing and flushing to the area tree of every open area of which the last line-area was a descendent. </p> <p> Once all of these areas have been closed, some dormant process or processes must wake up, flush the area sub-tree representing the page, and open a new page sub-tree in the area tree. Then the whole nested structure of flow objects and area production must be re-activated, at the point in processing at which the areas of the previous page were finalised, but with the new page environment. The most natural way of expressing the temporal relationship of these processes is by means of co-routines. </p> <p> Normal sub-routines (methods) display a hierarchical relationship where process A suspends on invoking process B, which on termination returns control to A which resumes from the point of suspension. Co-routines instead have a parallel relationship. Process A suspends on invoking process B, but process B also suspends on returning control to process A. To process B, this return of control appears to be an invocation of process A. When process A subsequently invokes B and suspends, B behaves as though its previous invocation of A has returned, and it resumes from the point of that invocation. So control bounces between the two, each one resuming where it left off. </p> <p> For example, think of a page-production method working on a complex page-sequence-master. </p> <source> void makePages(...) { ... while (pageSequence.hasNext()) { ... page = generateNextPage(...); boolean over = flow.fillPage(page); if (over) return; } } </source> <p> The <code>fillPage()</code> method, when it fills a page, will have unfinished business with the flow, which it will want to resume at the next call; hence co-routines. One way to implement them in Java is by threads synchronised on some common argument-passing object. </p> <p> Jeffrey H. Kingston, in <em>The Design and Implementation of the Lout Document Formatting Language</em> describes the <em>galley</em> abstraction which he implemented in <em>Lout</em>. A document to be formatted is a stream of text and symbols, some of which are <strong>receptive symbols</strong>. The output file is the first receptive symbol; the formatting document is the first galley. The archetypical example of a receptive symbol is <strong>@FootPlace</strong> and its corresponding galley definition, <strong>@FootNote</strong>. </p> <p> Each galley should be thought of as a concurrent process, and each is associated with a semaphore (or synchronisation object.) Galleys are free to "promote" components into receptive targets as long as a) an appropriate target has been encountered in the file, b) the component being promoted contains no unresolved galley targets itself, and c) there is sufficient room for the galley component at the target. If these conditions are not met, the galley blocks on its semaphore. When conditions change so that further progress may be possible, the semaphore is signalled. Note that the galleys are a hierarchy, and that the processing and promotion of galley contents happens <em>bottom-up</em>. </p> <p> This structure can be transposed into XSLFO terms by substituting <em>areas</em> for <em>targets</em> and certain <em>flow objects</em> from fo:flows and fo:static-contents in place of <em>galleys</em>. The picture is more complex in XSLFO because of the nature of the area tree. Flows cannot be as easily restarted after page completions because the whole of the nested area tree heirarchy has to be rebuilt. Compensating for this is the relative simplicity of regenerating many of the "nominal" area tree containers like viewports, regions and the pure containers. Traits will, for the most part, remain unchanged in the re-generation. However, because there can be no guarantee that general page layout will be unchanged, various sizing, positioning and possibly other layout-related traits will change. Whether any changes would be forced in the properties on the fo tree remains to be seen. In any case, the necessary adjustments could be made after the restart from the synchronisation <string>wait</strong>. </p> <p> In general, the galley processes will inherit, and themselves propagate, available space information. However, in many circumstances, the galleys must perform as much processing as possible with limited area size information. Such circumstances include "automatic" table layout and layout with side floats. Line-areas have a block progression dimension which is determined by their contents. To achieve full generality in such layouts, the contents of line-areas should be laid out as though their inline progression dimension was limited only by their content. In the process, all possible break-points can be determined. Where a line-area contains mixed fonts or embedded images, the bpd of the individual line-areas which are eventually stacked will, in general, depend on the line break points, but the advantage of this approach is that such actual selections can be backed out and new break points selected with a minimum of recalculation. This can potentially occur whenever a first attempt at page layout is backed out. </p> </s1> </body> </document>
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]