Bruno Lowagie <bruno <at> lowagie.com> writes: > Currently, if you want tagged PDF, it's mainly > a manual process as is done here: > http://www.1t3xt.info/examples/browse/?page=example&id=274 > http://www.1t3xt.info/examples/browse/?page=example&id=292 > That's not an ideal situation.
Thanks for the pointers. I hadn't even found that... come to think of it, I was talking to my mom last night and she mentioned that Google now has a source-code search. Worth looking into... Cool stuff: http://www.google.com/codesearch?hl=en Searching for PdfStructureElement, it turned up its use in iText, plus: http://itext.ugent.be/itext-in-action/examples/chapterX/ReadOutLoud.java and http://downloads.sourceforge.net/itextpdf/examples-155.zip The later is Quite Interesting (if Aandi will forgive the pun). It searched a zip file, turned up "StructTest.java", and extracted it for me with highlights. Spiffy! > > > I see some support for BDC/EMC & MCID in PdfContentByte, role mapping, > > and that's about it. Do I have that much work ahead of me, or am I missing > > something? > > I once introduced MarkedObject which is an object that > can hold any other iText Element (except for Chapter/Section, > those would be wrapped in a MarkedSection object). > The idea would be that you could use MarkedObject to add > attributes to the 'basic building blocks', then PdfDocument > could create BDC/EMC & MCID statements automatically. > However, I never finished that work due to lack of time. > Maybe another pair of eyes can have a look at it and > evaluate if that's a viable solution. I'm looking at lower level support than that: 1) Decouple mark index generation from PdfContentByte call order... Z order isn't necessarily reading order. LiquidOffice Designer is funky that way. Heck, we maintain separate tab order and reading order. Can you think of a single valid case where you'd want those two to be different? 2) Support in BaseField and PdfAnnotation for marked object references (current implementation only supports marked content in streams). My text layout, graphics, tables, and so forth are all written through PdfContentBytes, either from PdfWriter.getContent() or appearances stuffed into read-only, LAYOUT_ICON_ONLY PushbuttonFields (far more often than not for various reasons). It's all laid out just so for me, and I don't think I could get the precision we're after through Element & its kin. When I was building our initial attempt at structure in LiquidOffice Designer, I didn't have any reference files to go by. Trying to get your output right with just the PDF Reference to go by (which isn't tagged even now... would have been nice) was Painful. These days, LifeCycle Designer builds structure into it's output. Handy, if more complicated than strictly necessary: Document Page (which is mapped to "Part") Sect Sect Div P (is for paragraph) Fields get deeper, and I haven't even tried looking at a dynamic table... I feared for my sanity. :p They don't use structure attributes (pdf ref 10.7.4 "standard Structure Attributes") at all, but I think I can figure out what I need without too much trouble. --Mark Storer Senior Software Engineer Cardiff.com #include <disclaimer> typedef std::Disclaimer<Cardiff> DisCard; ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions Do you like iText? Buy the iText book: http://www.1t3xt.com/docs/book.php Or leave a tip: https://tipit.to/itexttipjar
