*Thanks* Stefano. Coming soon to a How-To near you... Diana
On Thursday, June 6, 2002, at 08:51 AM, Stefano Mazzocchi wrote: > Konstantin Piroumian wrote: > >> Hm... Does anybody have an idea on how to paginate the content? > > Ok, damn it, I don't have time to make mark this up, but since it's the > content that is useful, here's a small tutorial for the Paginator. > > - 0 - > > Paginator Transformer > ===================== > > classname: org.apache.cocoon.transformation.paginatation.Paginator > location: scratchpad (available in both cocoon 2.1-dev and 2.0.3-dev) > > Design idea > ----------- > > The paginator is a 'FilterTransformer' on pagination steroids. It works > filtering SAX events things out and counting page. > > The design isn't very efficient since it has to process the entire file > to extract a single page. It works nicely with few tens of pages, but I > would seriously suggest *against* using it for books or very big > documents. > > The good news is that its cacheable, so if the document doesn't change > and the same page is requested, there is no need to reprocess the > document. > > Anyway, for static generation, all this doesn't really matter. > > A simple example of use > ----------------------- > > Suppose you have an XML file like this > > <a> > <b/> > <b/> > <b/> > <b/> > <b/> > <b/> > <b/> > </a> > > and you want to paginate this having 3 <b> elements per page. In order > to achieve this, you write a simple "pagesheet" (which contains the > instructions for the filter, much like a stylesheet gives instructions > to an xslt processor) like this: > > <?xml version="1.0"?> > <pagesheet xmlns="http://apache.org/cocoon/paginate/1.0"> > <rules> > <count type="element" name="b" num="3"/> > </rules> > </pagesheet> > > then you connect the two with a sitemap snippet like this: > > <map:match pattern="page(*)"> > <map:generate src="document.xml"/> > <map:transform type="paginate" src="pagesheets/images.xml"> > <map:parameter name="page" value="{2}"/> > </map:transform> > <map:serialize type="xml"/> > </map:match> > > and accessing the URI page(1) yields > > <a> > <b> > <b> > <b> > <page:page xmlns:page="http://apache.org/cocoon/paginate/1.0" > current="1" > total="3" > current-uri="page(1)" > clean-uri="page" > /> > </a> > > which can be easily transformed into something more meaningful. > > Note that the transformer processes all the pages to obtain the 'total'. > There is no way around this. > > Adding navigation > ----------------- > > The problem with XSLT-based pagination is that the logic is very complex > to define in XSLT and is rarely reusable across different pagination > needs. This was the main reason for the creation of a custom components > for this. > > But since we have a full blown pagesheet language, there are a few other > things that we can make the Paginator do, most important, navigation. > > For example, with this other pagesheet > > <?xml version="1.0"?> > <pagesheet xmlns="http://apache.org/cocoon/paginate/1.0"> > <rules> > <count type="element" name="b" num="3"/> > <link type="unit" num="1"/> > </rules> > </pagesheet> > > indicates that the transformer must understand how the page was encoded > in the given URI and provide a link to the pages +/- 1 position, if they > are available. > > So, using the same environment as before we get > > <a> > <b> > <b> > <b> > <page:page xmlns:page="http://apache.org/cocoon/paginate/1.0" > current="1" > total="3" > current-uri="page(1)" > clean-uri="page"> > <page:link page="2" type="next" uri="page(2)"/> > </page:page> > </a> > > which indicates > > 1) there is no page 0, so no link is created. > 2) the link goes to page 2, the type is 'next' (useful for > visualization) and the URI is page(2) (useful for linking without > XSLT-specific logic). > > NOTE: the URI is re-encoded using the same pattern, this paginator > assumes that the 'round brakets' are used to identify page numbering. > > Now, without changing anything, requesting page(2) would yield > > <a> > <b> > <b> > <b> > <page:page xmlns:page="http://apache.org/cocoon/paginate/1.0" > current="2" > total="3" > current-uri="page(2)" > clean-uri="page"> > <page:link page="1" type="prev" uri="page(1)"/> > <page:link page="3" type="next" uri="page(3)"/> > </page:page> > </a> > > while page(3) would yield: > > <a> > <b> > <page:page xmlns:page="http://apache.org/cocoon/paginate/1.0" > current="3" > total="3" > current-uri="page(3)" > clean-uri="page"> > <page:link page="2" type="prev" uri="page(2)"/> > </page:page> > </a> > > NOTE: here there is only one <b> because the original document doesn't > contain enough elements to fill the page entirely. It's the modulo of > the division. > > A real-life example > ------------------- > > Here are a few pagesheets which are a little more complex: > > Paginating the results from DirectoryGenerator: > > <?xml version="1.0"?> > <pagesheet xmlns="http://apache.org/cocoon/paginate/1.0"> > <rules> > <count type="element" name="file" > namespace="http://apache.org/cocoon/directory/2.0" num="16"/> > <link type="unit" num="2"/> > <link type="range" value="5"/> > </rules> > </pagesheet> > > This says: > > 1) paginate 16 files per page > 2) provide me with links to +/- 1 and +/- 2 pages (when available) > 3) provide me with linkts to +/- 5 (when available) > > So, suppose we have a directory with 300 files and we request page 10, > the generated page will be > > <dir:directory> > <dir:file ...> > > [other 15 dir:file] > > <page:page xmlns:page="http://apache.org/cocoon/paginate/1.0" > current="10" > total="19" > current-uri="dir(10)" > clean-uri="dir"> > <page:range-link page="5" type="prev" uri="page(5)"/> > <page:link page="8" type="prev" uri="page(8)"/> > <page:link page="9" type="prev" uri="page(9)"/> > <page:link page="11" type="next" uri="page(11)"/> > <page:link page="12" type="next" uri="page(12)"/> > <page:range-link page="15" type="next" uri="page(15)"/> > </page:page> > </dir:directory> > > Asymmetric pagination > --------------------- > > We have also the ability to indicate different rules for each page, so: > > <pagesheet xmlns="http://apache.org/cocoon/paginate/1.0"> > <rules page="1"> > <count type="element" name="b" num="5"/> > <link type="unit" num="1"/> > </rules> > <rules> > <count type="element" name="b" num="10"/> > <link type="unit" num="2"/> > </rules> > </pagesheet> > > Count types > ----------- > > The paginator works by counting stuff. It's up to you to define what you > want to use for counting and you do so with the attributes of the > <count> element in the pagesheet. > > This element supports 2 required attributes: > > num="" -> a number indicating how many times the thing to count must be > present in this page. > > type="" -> the type of counting that the paginator must perform. Only > one type is currently implemented and two are currently supported. > > type="element" -> makes the paginator counts the startElement() SAX > events > > type="chars" -> (not currently implemented!) makes the paginator > count the chars inclued in the page. > > In case type="element" is used, two other attributes become useful: > > name="" -> the name of the element (without namespace prefix!) > > namespace="" -> the URI of the namespace (if not specified, the default > NS is used) > > - o - > > Ok, from now on some RT on the future of this transformer: > > Using the paginator for docs > ---------------------------- > > I originally wrote the paginator to paginate a directory listing and it > works great for paginating counting elements. For docs, it could be > possible to paginate by counting sections or subsections, but this > doesn't necessarely yield visually balanced pages (which is the reason > for web pagination). > > This is why I assumed a way to count by chars, even if I didn't go as > far as implementing it because while paginating by counting elements is > ok (sounds trivial, but it's not! think of nesting!) paginating by > counting chars is a real pain, due to the algorithms that must perform > 'chunking'. > > I mean, assume you have a document like this: > > <p>this is some <strong>text</strong> that happens > to be <em>chuncked</em></p> > ^ > | > > and suppose that counting the chars leads you to the chunking point > indicated by the arrow above. Cutting the page there results in XML > which is not well-formed. Providing a way to 're-well-form' the XML > truncates words. So, we must provide a way to 're-well-form' the XML > until the first 'block-level' element is encountered (p in this case). > But this means that the pagesheet must contain at least the list of > 'block-delimiting' elements (and the current Pagesheet parser parser and > object model doesn't support this notion). > > Result: pagination at the char-level is not trivial and requires a > little bit of work on the transformer > > Nesting behavior > ---------------- > > If counting by chars is a pain, even counting elements is not easy. > Assume you have this: > > <a> > <b> > <a> > <b> > <a> > <b/> > </a> > </b> > </a> > </b> > </a> > > and you want to paginate using one <b> per page, what do the pages look > like? ok, I'll give you some space to think about it. > > > > > > > > > > > > > > > > > > > > > > Ok, here is my solution (but I'm not sure it's the best): > > page 1: > > <a> > <b> > <a> > <a/> > </a> > </b> > </a> > > page 2: > > <a> > <a> > <b> > <a/> > </b> > </a> > </a> > > page 3: > > <a> > <a> > <a> > <b/> > </a> > </a> > </a> > > I'm pretty sure the current code is buggy someplace because for deep > nesting like this one, it looses some SAX events someplace and ends up > making the SAX stream non-well-formed and chocking the subsequent > transformers which are sensible to well-formness (such as XSLT). > > Note: the above might look like a mental exercise to many, but if you > think about our Document DTD 1.1, you'll find nested <section> and > paginating those results in very similar problems. But I'm not sure if > the solution adopted above is meaningful for a real-case pagination. I'm > up to suggestions in on this. > > Improving the concept > --------------------- > > One possible way to improve the concept is to count by XPath results, > that is you might want to count by 'sections included in sections'. > > Also, another way to improve the system is providing booleans: you might > want to count 'sessions AND chapters' (probably, XPath helps here as > well). > > Ok, anyway, hope this helps and sorry for taking so long to write this. > > -- > Stefano Mazzocchi One must still have chaos in oneself to be > able to give birth to a dancing star. > <[EMAIL PROTECTED]> Friedrich Nietzsche > -------------------------------------------------------------------- > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, email: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]