itext-questions  

[iText-questions] More fun with Structure

Mark Storer
Thu, 26 Jun 2008 18:20:00 -0700

The Good News:  I have my 1.0 version of structure up and running, with what
looks an awful lot like good output.  Acrobat 7 opens my files without
complaint, though I haven't tried a screen reader yet.

The Bad News: I'll be pretty broken outside my Franken-iText, and require some
rewriting to merge into the trunk.

There are several different objects that maintain a personal indirect reference
(PdfAnnotation & PdfStructureElement are the two I've been working with
recently).  In my code base, this behavior resides in PdfObject.  In objects
where they have access to a PdfWriter, requesting an indirect reference will
generate one.  So PdfAnnotation & PdfStructureElement both override
"getIndRef()" in PdfObject.  But the base implementation will return null (a
PdfObject isn't associated with any particular writer, so there's no way to
generate a valid reference) if a reference hasn't been assigned already.

My PdfIndirectReference also contains a pointer to the original object.  I
suspect this will do Savage and Indefensible things to PdfCopy's memory usage
(which I don't use: meh).  Looking up an object by its reference is fine when
you have a PdfReader to work with, but that doesn't work so well when you've
just created the object from within a PdfStamper... 

I ran into that case while trying to flatten a field I had created for a dynamic
table.  Buckets of fun.  The alternative was to write out the entire PDF with
the dynamic fields added, reload the form, and flatten it.  I felt that this
would be an unacceptable performance hit.

To support all that, I had to make some changes to PdfBody & PdfWriter. 
Generating a second reference if the PdfObject already has one is Not Okay... I
didn't want a bunch of PdfNulls floating around in the xref table, particularly
when someone referencing it thinks that null is actually a PdfStructureElement.
 In the event that a given object didn't already have an indirect reference, I
set the reference of the object in question in the event I might need it later.

I managed to decouple structure creation order from reading order, but there
were more shenanigans involved there as well.  In particular, I ended up
assigning marked content IDs to indirect references in the parent tree to
PdfStructureElements that hadn't even been "new"ed yet.

And just to add some icing on this particular cake, I also make use of my
PdfDictionary.getAs[Array|Dictionary|Number|etc], and PdfArray.getAt[*].  They
look up indirect references, and return null if the object in question is of the
wrong type.

(and I'm pretty sure I use some java 1.5 implicit Integer<->int conversions and
an "assert" or two)

So yes, my Franken-iText has lots of changes, and this code depends on quite a
few of them.  I'd love to get it all into the trunk, but I get the feeling some
of this stuff won't fly.

Oh, and it's all written to iText-paulo-155.  Ouch.

--Mark Storer
  Fearless iText Hacker
  Cardiff.com

"Bravery isn't the absence of fear, but the ability to act despite being
terrified.  People who don't feel fear are just stupid." -- Me, just now.


-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Do you like iText?
Buy the iText book: http://www.1t3xt.com/docs/book.php
Or leave a tip: https://tipit.to/itexttipjar