Re: [iText-questions] PDF Structure

Mark Storer Wed, 18 Jun 2008 11:25:39 -0700

Bruno Lowagie <bruno <at> lowagie.com> writes:

> Currently, if you want tagged PDF, it's mainly
> a manual process as is done here:
> http://www.1t3xt.info/examples/browse/?page=example&id=274
> http://www.1t3xt.info/examples/browse/?page=example&id=292
> That's not an ideal situation.


Thanks for the pointers.  I hadn't even found that... come to think of it, I was
talking to my mom last night and she mentioned that Google now has a source-code
search.  Worth looking into...

Cool stuff:  http://www.google.com/codesearch?hl=en

Searching for PdfStructureElement, it turned up its use in iText, plus:
http://itext.ugent.be/itext-in-action/examples/chapterX/ReadOutLoud.java
and
http://downloads.sourceforge.net/itextpdf/examples-155.zip

The later is Quite Interesting (if Aandi will forgive the pun).  It searched a
zip file, turned up "StructTest.java", and extracted it for me with highlights.

Spiffy!



> 
> > I see some support for BDC/EMC & MCID in PdfContentByte, role mapping, 
> > and that's about it.  Do I have that much work ahead of me, or am I missing
> > something?
> 
> I once introduced MarkedObject which is an object that
> can hold any other iText Element (except for Chapter/Section,
> those would be wrapped in a MarkedSection object).
> The idea would be that you could use MarkedObject to add
> attributes to the 'basic building blocks', then PdfDocument
> could create BDC/EMC & MCID statements automatically.
> However, I never finished that work due to lack of time.
> Maybe another pair of eyes can have a look at it and
> evaluate if that's a viable solution.

I'm looking at lower level support than that:

1) Decouple mark index generation from PdfContentByte call order... Z order
isn't necessarily reading order.  LiquidOffice Designer is funky that way.  

Heck, we maintain separate tab order and reading order.  Can you think of a
single valid case where you'd want those two to be different?

2) Support in BaseField and PdfAnnotation for marked object references (current
implementation only supports marked content in streams).

My text layout, graphics, tables, and so forth are all written through
PdfContentBytes, either from PdfWriter.getContent() or appearances stuffed into
read-only, LAYOUT_ICON_ONLY PushbuttonFields (far more often than not for
various reasons).  It's all laid out just so for me, and I don't think I could
get the precision we're after through Element & its kin.

When I was building our initial attempt at structure in LiquidOffice Designer, I
didn't have any reference files to go by.  Trying to get your output right with
just the PDF Reference to go by (which isn't tagged even now... would have been
nice) was Painful.

These days, LifeCycle Designer builds structure into it's output.  Handy, if
more complicated than strictly necessary:
Document
  Page (which is mapped to "Part")
    Sect
      Sect
        Div
          P (is for paragraph)

Fields get deeper, and I haven't even tried looking at a dynamic table... I
feared for my sanity.  :p

They don't use structure attributes (pdf ref 10.7.4 "standard Structure
Attributes") at all, but I think I can figure out what I need without too much
trouble.

--Mark Storer
  Senior Software Engineer
  Cardiff.com

#include <disclaimer>
typedef std::Disclaimer<Cardiff> DisCard;


-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Do you like iText?
Buy the iText book: http://www.1t3xt.com/docs/book.php
Or leave a tip: https://tipit.to/itexttipjar

Re: [iText-questions] PDF Structure

Reply via email to