Hello there. Thought I'd post an update. Admittedly I feel like I've
found a bit of a catch 22 situation. I successfully completed my code to
generate the balanced page tree on the fly and it works fine with a
single page sequence. However, this morning I discovered that this code
does not appear to work for multiple page sequences in a flow. (2x 101
page sequences, I got pages 1-9, 102, 10-101 then 103-end in that
order...) I guess this is where pages can come in in a different order
anyway then, and why the current indexing / nulls system is there. (And
shows that I am still learning the ropes as I go along...)
So I re-examined trying to generate the page tree after the pages have
been added into one big flat list. I can do this by, in
PDFDocument.outputTrailer(), calling a method to balance the page tree
before all the remaining objects are written out. This way pages can be
attached to nodes, and the tree hierarchy built up to the root node.
This is on paper a more elegant, efficient and easier solution to doing
it on the fly. But I ran into the same problem again - the page objects
are already written out.
Looking at the code I see the pages get written out / flushed as soon as
they are created. One page gets written out before the next page is
started. So moving pages from one node to another is impossible without
breaking the PDF. The only way round this currently is to assign pages
to nodes as they get created, but then this breaks the ordering system
in the notifyKidsRegistered() method which needs a flat list. Hence the
catch 22.
My current questions are:
-Why are the page objects flushed straight away? (Memory constraints?)
-Is it safe and wise to delay flushing the page objects until the end?
-If so then how do (or should) I do this? (Can I flush the page contents
but not the page object itself to minimise memory usage?)
-If not then how can I fix pages into individual nodes at creation time
without breaking it for multiple page sequences?
PDFDocumentHandler.endPage() is where 'flushPDFDoc()' is called as part
of the page generation process. The next page isn't added until after
this point.
The only workaround I can think of at the moment, having spoken to my
colleagues, is to add pages to their own individual page tree nodes,
then sort and arrange the nodes into a balanced tree. However this is
less than ideal with twice as many nodes as needed. (Although my manager
seems happy to go with this.) I haven't yet finished testing this
permutation (still debugging) but happy to ditch it if I can work out
how to delay writing out the page objects until I have re-arranged them
as in the 2nd paragraph. (It would be nice to maintain potential support
for out of order pages after all...)
Thanks a lot for your time!
-Mike
On 06/06/11 19:48, Andreas L. Delmelle wrote:
On 06 Jun 2011, at 10:59, Michael Rubin wrote:
Hi Mike
Thanks for your reply Andreas.
Currently it is hardcoded to 10 nodes or leaves, but adding an xconf setting
perhaps should be pretty easy and quick to do. However, having spoken to my
manager, there isn't the business requirement currently to make it
configurable, and given the current large array of options already available,
the preference is to just keep it hardcoded for now. At the very least I'll
make sure the maximum leaves / subnodes value is stored in a constant so if it
is made configurable then only the constant needs to be paid attention to
rather than multiple locations in the class.
OK, sounds good. I must admit, I was playing devil's advocate here, and did not
see any immediate reason to be able to change it either, but you can probably
bet your life that _someone_ is going to come up with this requirement as soon
as the feature is discovered... :-)
<snip />
... So for a 10,000 page doc there are going to be a lot of nulls in the page
tree. For now setting the toPDFString() to ignore the nulls rather than throw
an exception gets round this and allows the document to be correctly generated.
In my tests all the pages are produced in the correct order. I was wondering
though if there are any cases where the pages might not be passed in in the
correct order (and hence might possibly explain why the notifyKidsRegistered()
method was written in the way it is), and if so if that has any implications on
the way I have written the balanced page tree code updates.
I think the original idea was that PDF would, in the long run, also be able to
do out-of-order rendering (i.e. if page N in a document would be completely
resolved, and thus could be rendered, before page N-1 --in that case, the null
reference would be needed as a placeholder for the not-yet-finished page).
At any rate, AFAIR, this was never actually implemented for PDF, so that
explains why you see all pages in the correct order every time.
If it is cleaner to alter notifyKidRegistered() and avoid those nulls from
being inserted in the first place, I would prefer that over just skipping them
in toPDFString(). Not a must, though...
Regards
Andreas
---
Michael Rubin
Developer
T: +44 20 8238 7400
F: +44 20 8238 7401
[email protected]
The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us
immediately and then destroy it.