Re: Retrieving Objects question

Michael Rubin Wed, 08 Jun 2011 08:16:51 -0700

Hello there. Thought I'd post an update. Admittedly I feel like I'vefound a bit of a catch 22 situation. I successfully completed my code togenerate the balanced page tree on the fly and it works fine with asingle page sequence. However, this morning I discovered that this codedoes not appear to work for multiple page sequences in a flow. (2x 101page sequences, I got pages 1-9, 102, 10-101 then 103-end in thatorder...) I guess this is where pages can come in in a different orderanyway then, and why the current indexing / nulls system is there. (Andshows that I am still learning the ropes as I go along...)

So I re-examined trying to generate the page tree after the pages havebeen added into one big flat list. I can do this by, inPDFDocument.outputTrailer(), calling a method to balance the page treebefore all the remaining objects are written out. This way pages can beattached to nodes, and the tree hierarchy built up to the root node.This is on paper a more elegant, efficient and easier solution to doingit on the fly. But I ran into the same problem again - the page objectsare already written out.

Looking at the code I see the pages get written out / flushed as soon asthey are created. One page gets written out before the next page isstarted. So moving pages from one node to another is impossible withoutbreaking the PDF. The only way round this currently is to assign pagesto nodes as they get created, but then this breaks the ordering systemin the notifyKidsRegistered() method which needs a flat list. Hence thecatch 22.


My current questions are:

-Why are the page objects flushed straight away? (Memory constraints?)
-Is it safe and wise to delay flushing the page objects until the end?

-If so then how do (or should) I do this? (Can I flush the page contentsbut not the page object itself to minimise memory usage?)-If not then how can I fix pages into individual nodes at creation timewithout breaking it for multiple page sequences?

PDFDocumentHandler.endPage() is where 'flushPDFDoc()' is called as partof the page generation process. The next page isn't added until afterthis point.

The only workaround I can think of at the moment, having spoken to mycolleagues, is to add pages to their own individual page tree nodes,then sort and arrange the nodes into a balanced tree. However this isless than ideal with twice as many nodes as needed. (Although my managerseems happy to go with this.) I haven't yet finished testing thispermutation (still debugging) but happy to ditch it if I can work outhow to delay writing out the page objects until I have re-arranged themas in the 2nd paragraph. (It would be nice to maintain potential supportfor out of order pages after all...)


Thanks a lot for your time!

-Mike

On 06/06/11 19:48, Andreas L. Delmelle wrote:

On 06 Jun 2011, at 10:59, Michael Rubin wrote:

Hi Mike

Thanks for your reply Andreas.

Currently it is hardcoded to 10 nodes or leaves, but adding an xconf setting 
perhaps should be pretty easy and quick to do. However, having spoken to my 
manager, there isn't the business requirement currently to make it 
configurable, and given the current large array of options already available, 
the preference is to just keep it hardcoded for now. At the very least I'll 
make sure the maximum leaves / subnodes value is stored in a constant so if it 
is made configurable then only the constant needs to be paid attention to 
rather than multiple locations in the class.

OK, sounds good. I must admit, I was playing devil's advocate here, and did not 
see any immediate reason to be able to change it either, but you can probably 
bet your life that _someone_ is going to come up with this requirement as soon 
as the feature is discovered... :-)

<snip />

... So for a 10,000 page doc there are going to be a lot of nulls in the page 
tree. For now setting the toPDFString() to ignore the nulls rather than throw 
an exception gets round this and allows the document to be correctly generated. 
In my tests all the pages are produced in the correct order. I was wondering 
though if there are any cases where the pages might not be passed in in the 
correct order (and hence might possibly explain why the notifyKidsRegistered() 
method was written in the way it is), and if so if that has any implications on 
the way I have written the balanced page tree code updates.

I think the original idea was that PDF would, in the long run, also be able to 
do out-of-order rendering (i.e. if page N in a document would be completely 
resolved, and thus could be rendered, before page N-1 --in that case, the null 
reference would be needed as a placeholder for the not-yet-finished page).
At any rate, AFAIR, this was never actually implemented for PDF, so that 
explains why you see all pages in the correct order every time.

If it is cleaner to alter notifyKidRegistered() and avoid those nulls from 
being inserted in the first place, I would prefer that over just skipping them 
in toPDFString(). Not a must, though...



Regards

Andreas
---






Michael Rubin
Developer

T: +44 20 8238 7400
F: +44 20 8238 7401

[email protected]

The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify usimmediately and then destroy it.

Re: Retrieving Objects question

Reply via email to