On 12/02/13 04:30, Neha Jain wrote: > Hi Team, > > I have a requirement of converting a PDF to XML i.e contents of PDF to XML > > I have tried using TaggedPdfReaderToolbut I get the following exception > > Exception in thread "main" _java.io.IOException_: No StructTreeRoot > found, this probably isn't a tagged PDF document! > > I understand that PDF is unstructured(no tags to identify headings, > title, table, image etc) and so it cannot covert document to xml.
A pdf file can either be tagged or not; however, tags is this context are not the tags in and html or xml context. Chapter 13 of the itext book: http://itextpdf.com/book/chapter.php?id=13 on page 423 explains what a tagged pdf file is. > > Please confirm my understanding. > > I have tried using PDFReader class which helps me get entire content of > pdf but I am not able to find out which is the heading , title, table in > the pdf content. My requirement is to create an XML doc with heading in > pdf as tags and content in pdf as tag-element contents. > > Please let me know how this can be achieved using iText. Its urgent. I don't know how to do this without a tagged pdf. With a tagged pdf, TaggedPdfReaderTool works; however, there's no formatting information (no style sheet to specify exactly where the xml elements are to appear on the page); hence, you can see how it would look in a web browser. I'm trying to figure out how to extract this formatting information now. > > Thanks in advance > > Regards, > > Neha > HTH. -regards, Larry ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php