On 12/02/13 11:56, Larry Evans wrote:
> On 12/02/13 04:30, Neha Jain wrote:
>> Hi Team,
>>
>> I have a requirement of converting a PDF to XML i.e contents of PDF to XML
>>
>> I have tried using TaggedPdfReaderToolbut I get the following exception
>>
>> Exception in thread "main" _java.io.IOException_: No StructTreeRoot
>> found, this probably isn't a tagged PDF document!
>>
>> I understand that PDF is unstructured(no tags to identify headings,
>> title, table, image etc) and so it cannot covert document to xml.
>
> A pdf file can either be tagged or not; however, tags is this context
> are not the tags in and html or xml context.
> Chapter 13 of the itext book:
>
> http://itextpdf.com/book/chapter.php?id=13
>
> on page 423 explains what a tagged pdf file is.

Page 514 of the book says the TaggedPdfReaderTool:

   won't work for PDF documents that don't have any structure...
   but it will work for most tagged PDF files.

So I guess your out of luck with an untagged PDF document.

>
>>
>> Please confirm my understanding.
>>
>> I have tried using PDFReader class which helps me get entire content of
>> pdf but I am not able to find out which is the heading , title, table in
>> the pdf content. My requirement is to create an XML doc with heading in
>> pdf as tags and content in pdf as tag-element contents.
>>
>> Please let me know how this can be achieved using iText. Its urgent.
>
> I don't know how to do this without a tagged pdf.  With a tagged pdf,
> TaggedPdfReaderTool works;
[snip]
There is another tool:

http://www.mobipocket.com/dev/pdf2xml/

However, it doesn't handle fields, or it doesn't show
any fields when run on:

   http://www.irs.gov/pub/irs-pdf/f1040.pdf

Instead, it just puts the text in xml elements.

HTH.

-regards,
Larry



------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to