Thanks for a very good explanation and the RT. You've sounded some of my
doubts and added some more to think about.
To summarize the RT: the paginator is rather flexible, but it's not very
well suited for documentation pagination. We have a few good options for
documentation pagination and they are not yet implemented. Right?
The options are:
- use only top-level sections for pagination - already implemented,
but not very useful for docs
- count sections/paragraphs - a little more advanced version of the
first one
- count chars in some intellectual way - the best way, but not
implemented, have some algorythmical issues and can require some additional
analysis on word/sentence level, e.g. do not break a word on pagination,
etc.
The last two options require also to 're-well-form' the resulting XML, which
can be also non trivial.
Another thing that seems a little limiting to me (or maybe I read not very
careful?) is the pagination rules are static. I can imagin a situation when
we will need to use set some pagination params dynamically, e.g. the item
count. Did I miss it or it's not there?
Are there any solutions, ideas? We could use a Serializer - it's the only
component that can output non-well-formed XML - but in that case we will
end-up with a none well formed HTML.
So, my opinion on pagination is this: we need to count chars/words, break
the page and re-well-form the result. Maybe something like a reverse
Recorder (it can record SAX events and then can fire their end events when
called) can be used to implement 're-well-form' feature?
--
Konstantin Piroumian
[EMAIL PROTECTED]
> -----Original Message-----
> From: Stefano Mazzocchi [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, June 06, 2002 4:52 PM
> To: [EMAIL PROTECTED]; Apache Cocoon
> Subject: Paginating Content
>
>
> Konstantin Piroumian wrote:
>
> > Hm... Does anybody have an idea on how to paginate the content?
>
> Ok, damn it, I don't have time to make mark this up, but
> since it's the
> content that is useful, here's a small tutorial for the Paginator.
>
> - 0 -
>
> Paginator Transformer
> =====================
>
> classname: org.apache.cocoon.transformation.paginatation.Paginator
> location: scratchpad (available in both cocoon 2.1-dev and 2.0.3-dev)
>
> Design idea
> -----------
>
> The paginator is a 'FilterTransformer' on pagination
> steroids. It works
> filtering SAX events things out and counting page.
>
> The design isn't very efficient since it has to process the
> entire file
> to extract a single page. It works nicely with few tens of
> pages, but I
> would seriously suggest *against* using it for books or very big
> documents.
>
> The good news is that its cacheable, so if the document doesn't change
> and the same page is requested, there is no need to reprocess the
> document.
>
> Anyway, for static generation, all this doesn't really matter.
>
> A simple example of use
> -----------------------
>
> Suppose you have an XML file like this
>
> <a>
> <b/>
> <b/>
> <b/>
> <b/>
> <b/>
> <b/>
> <b/>
> </a>
>
> and you want to paginate this having 3 <b> elements per page. In order
> to achieve this, you write a simple "pagesheet" (which contains the
> instructions for the filter, much like a stylesheet gives instructions
> to an xslt processor) like this:
>
> <?xml version="1.0"?>
> <pagesheet xmlns="http://apache.org/cocoon/paginate/1.0">
> <rules>
> <count type="element" name="b" num="3"/>
> </rules>
> </pagesheet>
>
> then you connect the two with a sitemap snippet like this:
>
> <map:match pattern="page(*)">
> <map:generate src="document.xml"/>
> <map:transform type="paginate" src="pagesheets/images.xml">
> <map:parameter name="page" value="{2}"/>
> </map:transform>
> <map:serialize type="xml"/>
> </map:match>
>
> and accessing the URI page(1) yields
>
> <a>
> <b>
> <b>
> <b>
> <page:page xmlns:page="http://apache.org/cocoon/paginate/1.0"
> current="1"
> total="3"
> current-uri="page(1)"
> clean-uri="page"
> />
> </a>
>
> which can be easily transformed into something more meaningful.
>
> Note that the transformer processes all the pages to obtain
> the 'total'.
> There is no way around this.
>
> Adding navigation
> -----------------
>
> The problem with XSLT-based pagination is that the logic is
> very complex
> to define in XSLT and is rarely reusable across different pagination
> needs. This was the main reason for the creation of a custom
> components
> for this.
>
> But since we have a full blown pagesheet language, there are
> a few other
> things that we can make the Paginator do, most important, navigation.
>
> For example, with this other pagesheet
>
> <?xml version="1.0"?>
> <pagesheet xmlns="http://apache.org/cocoon/paginate/1.0">
> <rules>
> <count type="element" name="b" num="3"/>
> <link type="unit" num="1"/>
> </rules>
> </pagesheet>
>
> indicates that the transformer must understand how the page
> was encoded
> in the given URI and provide a link to the pages +/- 1
> position, if they
> are available.
>
> So, using the same environment as before we get
>
> <a>
> <b>
> <b>
> <b>
> <page:page xmlns:page="http://apache.org/cocoon/paginate/1.0"
> current="1"
> total="3"
> current-uri="page(1)"
> clean-uri="page">
> <page:link page="2" type="next" uri="page(2)"/>
> </page:page>
> </a>
>
> which indicates
>
> 1) there is no page 0, so no link is created.
> 2) the link goes to page 2, the type is 'next' (useful for
> visualization) and the URI is page(2) (useful for linking without
> XSLT-specific logic).
>
> NOTE: the URI is re-encoded using the same pattern, this paginator
> assumes that the 'round brakets' are used to identify page numbering.
>
> Now, without changing anything, requesting page(2) would yield
>
> <a>
> <b>
> <b>
> <b>
> <page:page xmlns:page="http://apache.org/cocoon/paginate/1.0"
> current="2"
> total="3"
> current-uri="page(2)"
> clean-uri="page">
> <page:link page="1" type="prev" uri="page(1)"/>
> <page:link page="3" type="next" uri="page(3)"/>
> </page:page>
> </a>
>
> while page(3) would yield:
>
> <a>
> <b>
> <page:page xmlns:page="http://apache.org/cocoon/paginate/1.0"
> current="3"
> total="3"
> current-uri="page(3)"
> clean-uri="page">
> <page:link page="2" type="prev" uri="page(2)"/>
> </page:page>
> </a>
>
> NOTE: here there is only one <b> because the original document doesn't
> contain enough elements to fill the page entirely. It's the modulo of
> the division.
>
> A real-life example
> -------------------
>
> Here are a few pagesheets which are a little more complex:
>
> Paginating the results from DirectoryGenerator:
>
> <?xml version="1.0"?>
> <pagesheet xmlns="http://apache.org/cocoon/paginate/1.0">
> <rules>
> <count type="element" name="file"
> namespace="http://apache.org/cocoon/directory/2.0" num="16"/>
> <link type="unit" num="2"/>
> <link type="range" value="5"/>
> </rules>
> </pagesheet>
>
> This says:
>
> 1) paginate 16 files per page
> 2) provide me with links to +/- 1 and +/- 2 pages (when available)
> 3) provide me with linkts to +/- 5 (when available)
>
> So, suppose we have a directory with 300 files and we request page 10,
> the generated page will be
>
> <dir:directory>
> <dir:file ...>
>
> [other 15 dir:file]
>
> <page:page xmlns:page="http://apache.org/cocoon/paginate/1.0"
> current="10"
> total="19"
> current-uri="dir(10)"
> clean-uri="dir">
> <page:range-link page="5" type="prev" uri="page(5)"/>
> <page:link page="8" type="prev" uri="page(8)"/>
> <page:link page="9" type="prev" uri="page(9)"/>
> <page:link page="11" type="next" uri="page(11)"/>
> <page:link page="12" type="next" uri="page(12)"/>
> <page:range-link page="15" type="next" uri="page(15)"/>
> </page:page>
> </dir:directory>
>
> Asymmetric pagination
> ---------------------
>
> We have also the ability to indicate different rules for each
> page, so:
>
> <pagesheet xmlns="http://apache.org/cocoon/paginate/1.0">
> <rules page="1">
> <count type="element" name="b" num="5"/>
> <link type="unit" num="1"/>
> </rules>
> <rules>
> <count type="element" name="b" num="10"/>
> <link type="unit" num="2"/>
> </rules>
> </pagesheet>
>
> Count types
> -----------
>
> The paginator works by counting stuff. It's up to you to
> define what you
> want to use for counting and you do so with the attributes of the
> <count> element in the pagesheet.
>
> This element supports 2 required attributes:
>
> num="" -> a number indicating how many times the thing to
> count must be
> present in this page.
>
> type="" -> the type of counting that the paginator must perform. Only
> one type is currently implemented and two are currently supported.
>
> type="element" -> makes the paginator counts the
> startElement() SAX
> events
>
> type="chars" -> (not currently implemented!) makes the paginator
> count the chars inclued in the page.
>
> In case type="element" is used, two other attributes become useful:
>
> name="" -> the name of the element (without namespace prefix!)
>
> namespace="" -> the URI of the namespace (if not specified,
> the default
> NS is used)
>
> - o -
>
> Ok, from now on some RT on the future of this transformer:
>
> Using the paginator for docs
> ----------------------------
>
> I originally wrote the paginator to paginate a directory
> listing and it
> works great for paginating counting elements. For docs, it could be
> possible to paginate by counting sections or subsections, but this
> doesn't necessarely yield visually balanced pages (which is the reason
> for web pagination).
>
> This is why I assumed a way to count by chars, even if I didn't go as
> far as implementing it because while paginating by counting
> elements is
> ok (sounds trivial, but it's not! think of nesting!) paginating by
> counting chars is a real pain, due to the algorithms that must perform
> 'chunking'.
>
> I mean, assume you have a document like this:
>
> <p>this is some <strong>text</strong> that happens
> to be <em>chuncked</em></p>
> ^
> |
>
> and suppose that counting the chars leads you to the chunking point
> indicated by the arrow above. Cutting the page there results in XML
> which is not well-formed. Providing a way to 're-well-form' the XML
> truncates words. So, we must provide a way to 're-well-form' the XML
> until the first 'block-level' element is encountered (p in this case).
> But this means that the pagesheet must contain at least the list of
> 'block-delimiting' elements (and the current Pagesheet parser
> parser and
> object model doesn't support this notion).
>
> Result: pagination at the char-level is not trivial and requires a
> little bit of work on the transformer
>
> Nesting behavior
> ----------------
>
> If counting by chars is a pain, even counting elements is not easy.
> Assume you have this:
>
> <a>
> <b>
> <a>
> <b>
> <a>
> <b/>
> </a>
> </b>
> </a>
> </b>
> </a>
>
> and you want to paginate using one <b> per page, what do the
> pages look
> like? ok, I'll give you some space to think about it.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Ok, here is my solution (but I'm not sure it's the best):
>
> page 1:
>
> <a>
> <b>
> <a>
> <a/>
> </a>
> </b>
> </a>
>
> page 2:
>
> <a>
> <a>
> <b>
> <a/>
> </b>
> </a>
> </a>
>
> page 3:
>
> <a>
> <a>
> <a>
> <b/>
> </a>
> </a>
> </a>
>
> I'm pretty sure the current code is buggy someplace because for deep
> nesting like this one, it looses some SAX events someplace and ends up
> making the SAX stream non-well-formed and chocking the subsequent
> transformers which are sensible to well-formness (such as XSLT).
>
> Note: the above might look like a mental exercise to many, but if you
> think about our Document DTD 1.1, you'll find nested <section> and
> paginating those results in very similar problems. But I'm not sure if
> the solution adopted above is meaningful for a real-case
> pagination. I'm
> up to suggestions in on this.
>
> Improving the concept
> ---------------------
>
> One possible way to improve the concept is to count by XPath results,
> that is you might want to count by 'sections included in sections'.
>
> Also, another way to improve the system is providing
> booleans: you might
> want to count 'sessions AND chapters' (probably, XPath helps here as
> well).
>
> Ok, anyway, hope this helps and sorry for taking so long to
> write this.
>
> --
> Stefano Mazzocchi One must still have chaos in oneself to be
> able to give birth to a dancing star.
> <[EMAIL PROTECTED]> Friedrich Nietzsche
> --------------------------------------------------------------------
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, email: [EMAIL PROTECTED]
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]