Thanks for your reply Andreas.

Currently it is hardcoded to 10 nodes or leaves, but adding an xconf setting perhaps should be pretty easy and quick to do. However, having spoken to my manager, there isn't the business requirement currently to make it configurable, and given the current large array of options already available, the preference is to just keep it hardcoded for now. At the very least I'll make sure the maximum leaves / subnodes value is stored in a constant so if it is made configurable then only the constant needs to be paid attention to rather than multiple locations in the class.

As far as I can tell the page objects are kept alive anyway by the references in the document object itself (atleast until the trailer is written). So me keeping references in the page tree object should not extend their life in any way.

Currently, if I take a 20 page document, then there are two sets of 10 pages, one in each node, each node being children of the root node. For the first 10 pages the kids list is something like {1 0 R, 2 0 R, 3 0 R, 4 0 R, 5 0 R, 6 0 R, 7 0 R, 8 0 R, 9 0 R, 10 0 R} (object numbers not intended to be realistic for this example). But for the second 10 pages the kids list is {null, null, null, null, null, null, null, null, null, null, 11 0 R, 12 0 R, 13 0 R, 14 0 R, 15 0 R, 16 0 R, 17 0 R, 18 0 R, 19 0 R, 20 0 R} since the page index (which is zero based) makes the page get placed in that index position on the tree, any previous unused indexes being filled with null. So for a 10,000 page doc there are going to be a lot of nulls in the page tree. For now setting the toPDFString() to ignore the nulls rather than throw an exception gets round this and allows the document to be correctly generated. In my tests all the pages are produced in the correct order. I was wondering though if there are any cases where the pages might not be passed in in the correct order (and hence might possibly explain why the notifyKidsRegistered() method was written in the way it is), and if so if that has any implications on the way I have written the balanced page tree code updates.

Thanks.

-Mike

On 03/06/11 22:38, Andreas L. Delmelle wrote:
On 03 Jun 2011, at 10:54, Michael Rubin wrote:

Hi Mike

Thanks a lot for your reply last week Andreas. Sorry for the delay. Been away 
and offline... FYI to follow up on the work I was doing:
<snip />
So for example a 101 page document will have a root PDFPages node with two 
sub-nodes underneath. The first will contain a count of 100, and have 10 
sub-nodes, each containing 10 pages. The second will simply contain 1 page. 
More new pages will get added to the second sub-node (moving pages down to new 
sub-nodes to avoid more than 10 pages per node) until it's count reaches 100 
too, then another node created. Once 10 nodes under the root exist (at 1000 
pages) they will get moved down below a new root level sub-node with a count of 
1000, and a new root level sub-node created, and so on.
Cool! Impressive work. Will the number of pages per node be configurable?

Next task is to write a JUnit test since one appears not to exist... I guess 
remaining thoughts currently are:

- Wondering if keeping references to a page tree object's sub-nodes or leaves 
is the best way or can I improve it further? (Bearing in mind memory usage and 
performance.)
It depends a bit on whether you are thereby keeping PDFPage objects alive 
longer than necessary. The current design only stores the pages' referencePDF, 
so that seems safe.

- Was wondering if the trailer objects list is the right place to write the new 
sub-node PDFPages objects. (But if writing an object to the objects list - 
addObject() instead of addTrailerObject() - it gets written out too soon before 
I have added all the pages.) But given how it writes the objects out before 
writing the xref and trailer it seems OK and parses and shows fine in 
PDFBox/PDFDebugger and the evince PDF Reader in ubuntu.
I would think that that is the correct place, although I must admit, I would 
have to check the PDF Spec to be certain.

- When registering the pages themselves via notifyKidsRegistered() method it 
extracts the page index number and puts the reference at that index in the kids 
list, filling empty spaces ahead of it with nulls. So when counting kids and 
writing out the pdf code text I had to ignore nulls and 'gaps' in the kids list 
since not all the kids are in the same list any more (spread across multiple 
page tree nodes). I was wondering why this method was written like this, and 
doesn't simply append new pages to the end of the list all the time.
AFAICT, what it is designed to do is make sure that the page is entered at the 
correct index in the list of kids. It would only create null entries if the 
list is not yet large enough. I have a feeling this is just by design, taking 
into account a single page tree node only (see the javadoc of the PDFPages 
class...)


Regards

Andreas
---





Michael Rubin
Developer

T: +44 20 8238 7400
F: +44 20 8238 7401

[email protected]

The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it.



Reply via email to