Re: Paginating Content

Diana Shannon Thu, 06 Jun 2002 08:36:11 -0700

*Thanks* Stefano. Coming soon to a How-To near you...

Diana


On Thursday, June 6, 2002, at 08:51  AM, Stefano Mazzocchi wrote:

> Konstantin Piroumian wrote:
>
>> Hm... Does anybody have an idea on how to paginate the content?
>
> Ok, damn it, I don't have time to make mark this up, but since it's the
> content that is useful, here's a small tutorial for the Paginator.
>
>                                    - 0 -
>
> Paginator Transformer
> =====================
>
> classname: org.apache.cocoon.transformation.paginatation.Paginator
> location: scratchpad (available in both cocoon 2.1-dev and 2.0.3-dev)
>
> Design idea
> -----------
>
> The paginator is a 'FilterTransformer' on pagination steroids. It works
> filtering SAX events things out and counting page.
>
> The design isn't very efficient since it has to process the entire file
> to extract a single page. It works nicely with few tens of pages, but I
> would seriously suggest *against* using it for books or very big
> documents.
>
> The good news is that its cacheable, so if the document doesn't change
> and the same page is requested, there is no need to reprocess the
> document.
>
> Anyway, for static generation, all this doesn't really matter.
>
> A simple example of use
> -----------------------
>
> Suppose you have an XML file like this
>
>  <a>
>   <b/>
>   <b/>
>   <b/>
>   <b/>
>   <b/>
>   <b/>
>   <b/>
>  </a>
>
> and you want to paginate this having 3 <b> elements per page. In order
> to achieve this, you write a simple "pagesheet" (which contains the
> instructions for the filter, much like a stylesheet gives instructions
> to an xslt processor) like this:
>
> <?xml version="1.0"?>
> <pagesheet xmlns="http://apache.org/cocoon/paginate/1.0";>
>  <rules>
>   <count type="element" name="b" num="3"/>
>  </rules>
> </pagesheet>
>
> then you connect the two with a sitemap snippet like this:
>
>    <map:match pattern="page(*)">
>     <map:generate src="document.xml"/>
>     <map:transform type="paginate" src="pagesheets/images.xml">
>       <map:parameter name="page" value="{2}"/>
>     </map:transform>
>     <map:serialize type="xml"/>
>    </map:match>
>
> and accessing the URI page(1) yields
>
>  <a>
>   <b>
>   <b>
>   <b>
>   <page:page xmlns:page="http://apache.org/cocoon/paginate/1.0";
>      current="1"
>      total="3"
>      current-uri="page(1)"
>      clean-uri="page"
>   />
>  </a>
>
> which can be easily transformed into something more meaningful.
>
> Note that the transformer processes all the pages to obtain the 'total'.
> There is no way around this.
>
> Adding navigation
> -----------------
>
> The problem with XSLT-based pagination is that the logic is very complex
> to define in XSLT and is rarely reusable across different pagination
> needs. This was the main reason for the creation of a custom components
> for this.
>
> But since we have a full blown pagesheet language, there are a few other
> things that we can make the Paginator do, most important, navigation.
>
> For example, with this other pagesheet
>
> <?xml version="1.0"?>
> <pagesheet xmlns="http://apache.org/cocoon/paginate/1.0";>
>  <rules>
>   <count type="element" name="b" num="3"/>
>   <link type="unit" num="1"/>
>  </rules>
> </pagesheet>
>
> indicates that the transformer must understand how the page was encoded
> in the given URI and provide a link to the pages +/- 1 position, if they
> are available.
>
> So, using the same environment as before we get
>
>  <a>
>   <b>
>   <b>
>   <b>
>   <page:page xmlns:page="http://apache.org/cocoon/paginate/1.0";
>      current="1"
>      total="3"
>      current-uri="page(1)"
>      clean-uri="page">
>    <page:link page="2" type="next" uri="page(2)"/>
>   </page:page>
>  </a>
>
> which indicates
>
>  1) there is no page 0, so no link is created.
>  2) the link goes to page 2, the type is 'next' (useful for
> visualization) and the URI is page(2) (useful for linking without
> XSLT-specific logic).
>
> NOTE: the URI is re-encoded using the same pattern, this paginator
> assumes that the 'round brakets' are used to identify page numbering.
>
> Now, without changing anything, requesting page(2) would yield
>
>  <a>
>   <b>
>   <b>
>   <b>
>   <page:page xmlns:page="http://apache.org/cocoon/paginate/1.0";
>      current="2"
>      total="3"
>      current-uri="page(2)"
>      clean-uri="page">
>    <page:link page="1" type="prev" uri="page(1)"/>
>    <page:link page="3" type="next" uri="page(3)"/>
>   </page:page>
>  </a>
>
> while page(3) would yield:
>
>  <a>
>   <b>
>   <page:page xmlns:page="http://apache.org/cocoon/paginate/1.0";
>      current="3"
>      total="3"
>      current-uri="page(3)"
>      clean-uri="page">
>    <page:link page="2" type="prev" uri="page(2)"/>
>   </page:page>
>  </a>
>
> NOTE: here there is only one <b> because the original document doesn't
> contain enough elements to fill the page entirely. It's the modulo of
> the division.
>
> A real-life example
> -------------------
>
> Here are a few pagesheets which are a little more complex:
>
> Paginating the results from DirectoryGenerator:
>
> <?xml version="1.0"?>
> <pagesheet xmlns="http://apache.org/cocoon/paginate/1.0";>
>  <rules>
>   <count type="element" name="file"
> namespace="http://apache.org/cocoon/directory/2.0"; num="16"/>
>   <link type="unit" num="2"/>
>   <link type="range" value="5"/>
>  </rules>
> </pagesheet>
>
> This says:
>
>  1) paginate 16 files per page
>  2) provide me with links to +/- 1 and +/- 2 pages (when available)
>  3) provide me with linkts to +/- 5 (when available)
>
> So, suppose we have a directory with 300 files and we request page 10,
> the generated page will be
>
>  <dir:directory>
>   <dir:file ...>
>
>   [other 15 dir:file]
>
>   <page:page xmlns:page="http://apache.org/cocoon/paginate/1.0";
>      current="10"
>      total="19"
>      current-uri="dir(10)"
>      clean-uri="dir">
>    <page:range-link page="5" type="prev" uri="page(5)"/>
>    <page:link page="8" type="prev" uri="page(8)"/>
>    <page:link page="9" type="prev" uri="page(9)"/>
>    <page:link page="11" type="next" uri="page(11)"/>
>    <page:link page="12" type="next" uri="page(12)"/>
>    <page:range-link page="15" type="next" uri="page(15)"/>
>   </page:page>
>  </dir:directory>
>
> Asymmetric pagination
> ---------------------
>
> We have also the ability to indicate different rules for each page, so:
>
> <pagesheet xmlns="http://apache.org/cocoon/paginate/1.0";>
>  <rules page="1">
>   <count type="element" name="b" num="5"/>
>   <link type="unit" num="1"/>
>  </rules>
>  <rules>
>   <count type="element" name="b" num="10"/>
>   <link type="unit" num="2"/>
>  </rules>
> </pagesheet>
>
> Count types
> -----------
>
> The paginator works by counting stuff. It's up to you to define what you
> want to use for counting and you do so with the attributes of the
> <count> element in the pagesheet.
>
> This element supports 2 required attributes:
>
>  num="" -> a number indicating how many times the thing to count must be
> present in this page.
>
>  type="" -> the type of counting that the paginator must perform. Only
> one type is currently implemented and two are currently supported.
>
>     type="element" -> makes the paginator counts the startElement() SAX
> events
>
>     type="chars" -> (not currently implemented!) makes the paginator
> count the chars inclued in the page.
>
> In case type="element" is used, two other attributes become useful:
>
>  name="" -> the name of the element (without namespace prefix!)
>
>  namespace="" -> the URI of the namespace (if not specified, the default
> NS is used)
>
>                                       - o -
>
> Ok, from now on some RT on the future of this transformer:
>
> Using the paginator for docs
> ----------------------------
>
> I originally wrote the paginator to paginate a directory listing and it
> works great for paginating counting elements. For docs, it could be
> possible to paginate by counting sections or subsections, but this
> doesn't necessarely yield visually balanced pages (which is the reason
> for web pagination).
>
> This is why I assumed a way to count by chars, even if I didn't go as
> far as implementing it because while paginating by counting elements is
> ok (sounds trivial, but it's not! think of nesting!) paginating by
> counting chars is a real pain, due to the algorithms that must perform
> 'chunking'.
>
> I mean, assume you have a document like this:
>
>  <p>this is some <strong>text</strong> that happens
>  to be <em>chuncked</em></p>
>              ^
>              |
>
> and suppose that counting the chars leads you to the chunking point
> indicated by the arrow above. Cutting the page there results in XML
> which is not well-formed. Providing a way to 're-well-form' the XML
> truncates words. So, we must provide a way to 're-well-form' the XML
> until the first 'block-level' element is encountered (p in this case).
> But this means that the pagesheet must contain at least the list of
> 'block-delimiting' elements (and the current Pagesheet parser parser and
> object model doesn't support this notion).
>
> Result: pagination at the char-level is not trivial and requires a
> little bit of work on the transformer
>
> Nesting behavior
> ----------------
>
> If counting by chars is a pain, even counting elements is not easy.
> Assume you have this:
>
>  <a>
>   <b>
>    <a>
>     <b>
>      <a>
>       <b/>
>      </a>
>     </b>
>    </a>
>   </b>
>  </a>
>
> and you want to paginate using one <b> per page, what do the pages look
> like? ok, I'll give you some space to think about it.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Ok, here is my solution (but I'm not sure it's the best):
>
> page 1:
>
>  <a>
>   <b>
>    <a>
>     <a/>
>    </a>
>   </b>
>  </a>
>
> page 2:
>
>  <a>
>   <a>
>    <b>
>     <a/>
>    </b>
>   </a>
>  </a>
>
> page 3:
>
>  <a>
>   <a>
>    <a>
>     <b/>
>    </a>
>   </a>
>  </a>
>
> I'm pretty sure the current code is buggy someplace because for deep
> nesting like this one, it looses some SAX events someplace and ends up
> making the SAX stream non-well-formed and chocking the subsequent
> transformers which are sensible to well-formness (such as XSLT).
>
> Note: the above might look like a mental exercise to many, but if you
> think about our Document DTD 1.1, you'll find nested <section> and
> paginating those results in very similar problems. But I'm not sure if
> the solution adopted above is meaningful for a real-case pagination. I'm
> up to suggestions in on this.
>
> Improving the concept
> ---------------------
>
> One possible way to improve the concept is to count by XPath results,
> that is you might want to count by 'sections included in sections'.
>
> Also, another way to improve the system is providing booleans: you might
> want to count 'sessions AND chapters' (probably, XPath helps here as
> well).
>
> Ok, anyway, hope this helps and sorry for taking so long to write this.
>
> --
> Stefano Mazzocchi      One must still have chaos in oneself to be
>                           able to give birth to a dancing star.
> <[EMAIL PROTECTED]>                             Friedrich Nietzsche
> --------------------------------------------------------------------
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, email: [EMAIL PROTECTED]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Re: Paginating Content

Reply via email to