On 03 Jun 2011, at 10:54, Michael Rubin wrote:

Hi Mike

> Thanks a lot for your reply last week Andreas. Sorry for the delay. Been away 
> and offline... FYI to follow up on the work I was doing:

<snip />
> So for example a 101 page document will have a root PDFPages node with two 
> sub-nodes underneath. The first will contain a count of 100, and have 10 
> sub-nodes, each containing 10 pages. The second will simply contain 1 page. 
> More new pages will get added to the second sub-node (moving pages down to 
> new sub-nodes to avoid more than 10 pages per node) until it's count reaches 
> 100 too, then another node created. Once 10 nodes under the root exist (at 
> 1000 pages) they will get moved down below a new root level sub-node with a 
> count of 1000, and a new root level sub-node created, and so on.

Cool! Impressive work. Will the number of pages per node be configurable?

> Next task is to write a JUnit test since one appears not to exist... I guess 
> remaining thoughts currently are:
> - Wondering if keeping references to a page tree object's sub-nodes or leaves 
> is the best way or can I improve it further? (Bearing in mind memory usage 
> and performance.)

It depends a bit on whether you are thereby keeping PDFPage objects alive 
longer than necessary. The current design only stores the pages' referencePDF, 
so that seems safe.

> - Was wondering if the trailer objects list is the right place to write the 
> new sub-node PDFPages objects. (But if writing an object to the objects list 
> - addObject() instead of addTrailerObject() - it gets written out too soon 
> before I have added all the pages.) But given how it writes the objects out 
> before writing the xref and trailer it seems OK and parses and shows fine in 
> PDFBox/PDFDebugger and the evince PDF Reader in ubuntu.

I would think that that is the correct place, although I must admit, I would 
have to check the PDF Spec to be certain.

> - When registering the pages themselves via notifyKidsRegistered() method it 
> extracts the page index number and puts the reference at that index in the 
> kids list, filling empty spaces ahead of it with nulls. So when counting kids 
> and writing out the pdf code text I had to ignore nulls and 'gaps' in the 
> kids list since not all the kids are in the same list any more (spread across 
> multiple page tree nodes). I was wondering why this method was written like 
> this, and doesn't simply append new pages to the end of the list all the time.

AFAICT, what it is designed to do is make sure that the page is entered at the 
correct index in the list of kids. It would only create null entries if the 
list is not yet large enough. I have a feeling this is just by design, taking 
into account a single page tree node only (see the javadoc of the PDFPages 



Reply via email to