Thanks for your reply Andreas.
Currently it is hardcoded to 10 nodes or leaves, but adding an xconf
setting perhaps should be pretty easy and quick to do. However, having
spoken to my manager, there isn't the business requirement currently to
make it configurable, and given the current large array of options
already available, the preference is to just keep it hardcoded for now.
At the very least I'll make sure the maximum leaves / subnodes value is
stored in a constant so if it is made configurable then only the
constant needs to be paid attention to rather than multiple locations in
the class.
As far as I can tell the page objects are kept alive anyway by the
references in the document object itself (atleast until the trailer is
written). So me keeping references in the page tree object should not
extend their life in any way.
Currently, if I take a 20 page document, then there are two sets of 10
pages, one in each node, each node being children of the root node. For
the first 10 pages the kids list is something like {1 0 R, 2 0 R, 3 0 R,
4 0 R, 5 0 R, 6 0 R, 7 0 R, 8 0 R, 9 0 R, 10 0 R} (object numbers not
intended to be realistic for this example). But for the second 10 pages
the kids list is {null, null, null, null, null, null, null, null, null,
null, 11 0 R, 12 0 R, 13 0 R, 14 0 R, 15 0 R, 16 0 R, 17 0 R, 18 0 R, 19
0 R, 20 0 R} since the page index (which is zero based) makes the page
get placed in that index position on the tree, any previous unused
indexes being filled with null. So for a 10,000 page doc there are going
to be a lot of nulls in the page tree. For now setting the toPDFString()
to ignore the nulls rather than throw an exception gets round this and
allows the document to be correctly generated. In my tests all the pages
are produced in the correct order. I was wondering though if there are
any cases where the pages might not be passed in in the correct order
(and hence might possibly explain why the notifyKidsRegistered() method
was written in the way it is), and if so if that has any implications on
the way I have written the balanced page tree code updates.
Thanks.
-Mike
On 03/06/11 22:38, Andreas L. Delmelle wrote:
On 03 Jun 2011, at 10:54, Michael Rubin wrote:
Hi Mike
Thanks a lot for your reply last week Andreas. Sorry for the delay. Been away
and offline... FYI to follow up on the work I was doing:
<snip />
So for example a 101 page document will have a root PDFPages node with two
sub-nodes underneath. The first will contain a count of 100, and have 10
sub-nodes, each containing 10 pages. The second will simply contain 1 page.
More new pages will get added to the second sub-node (moving pages down to new
sub-nodes to avoid more than 10 pages per node) until it's count reaches 100
too, then another node created. Once 10 nodes under the root exist (at 1000
pages) they will get moved down below a new root level sub-node with a count of
1000, and a new root level sub-node created, and so on.
Cool! Impressive work. Will the number of pages per node be configurable?
Next task is to write a JUnit test since one appears not to exist... I guess
remaining thoughts currently are:
- Wondering if keeping references to a page tree object's sub-nodes or leaves
is the best way or can I improve it further? (Bearing in mind memory usage and
performance.)
It depends a bit on whether you are thereby keeping PDFPage objects alive
longer than necessary. The current design only stores the pages' referencePDF,
so that seems safe.
- Was wondering if the trailer objects list is the right place to write the new
sub-node PDFPages objects. (But if writing an object to the objects list -
addObject() instead of addTrailerObject() - it gets written out too soon before
I have added all the pages.) But given how it writes the objects out before
writing the xref and trailer it seems OK and parses and shows fine in
PDFBox/PDFDebugger and the evince PDF Reader in ubuntu.
I would think that that is the correct place, although I must admit, I would
have to check the PDF Spec to be certain.
- When registering the pages themselves via notifyKidsRegistered() method it
extracts the page index number and puts the reference at that index in the kids
list, filling empty spaces ahead of it with nulls. So when counting kids and
writing out the pdf code text I had to ignore nulls and 'gaps' in the kids list
since not all the kids are in the same list any more (spread across multiple
page tree nodes). I was wondering why this method was written like this, and
doesn't simply append new pages to the end of the list all the time.
AFAICT, what it is designed to do is make sure that the page is entered at the
correct index in the list of kids. It would only create null entries if the
list is not yet large enough. I have a feeling this is just by design, taking
into account a single page tree node only (see the javadoc of the PDFPages
class...)
Regards
Andreas
---
Michael Rubin
Developer
T: +44 20 8238 7400
F: +44 20 8238 7401
[email protected]
The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us
immediately and then destroy it.