On 12/02/13 11:56, Larry Evans wrote: > On 12/02/13 04:30, Neha Jain wrote: >> Hi Team, >> >> I have a requirement of converting a PDF to XML i.e contents of PDF to XML >> >> I have tried using TaggedPdfReaderToolbut I get the following exception >> >> Exception in thread "main" _java.io.IOException_: No StructTreeRoot >> found, this probably isn't a tagged PDF document! >> >> I understand that PDF is unstructured(no tags to identify headings, >> title, table, image etc) and so it cannot covert document to xml. > > A pdf file can either be tagged or not; however, tags is this context > are not the tags in and html or xml context. > Chapter 13 of the itext book: > > http://itextpdf.com/book/chapter.php?id=13 > > on page 423 explains what a tagged pdf file is.
Page 514 of the book says the TaggedPdfReaderTool: won't work for PDF documents that don't have any structure... but it will work for most tagged PDF files. So I guess your out of luck with an untagged PDF document. > >> >> Please confirm my understanding. >> >> I have tried using PDFReader class which helps me get entire content of >> pdf but I am not able to find out which is the heading , title, table in >> the pdf content. My requirement is to create an XML doc with heading in >> pdf as tags and content in pdf as tag-element contents. >> >> Please let me know how this can be achieved using iText. Its urgent. > > I don't know how to do this without a tagged pdf. With a tagged pdf, > TaggedPdfReaderTool works; [snip] There is another tool: http://www.mobipocket.com/dev/pdf2xml/ However, it doesn't handle fields, or it doesn't show any fields when run on: http://www.irs.gov/pub/irs-pdf/f1040.pdf Instead, it just puts the text in xml elements. HTH. -regards, Larry ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php