On 12/02/13 04:30, Neha Jain wrote:
> Hi Team,
>
> I have a requirement of converting a PDF to XML i.e contents of PDF to XML
>
> I have tried using TaggedPdfReaderToolbut I get the following exception
>
> Exception in thread "main" _java.io.IOException_: No StructTreeRoot
> found, this probably isn't a tagged PDF document!
>
> I understand that PDF is unstructured(no tags to identify headings,
> title, table, image etc) and so it cannot covert document to xml.

A pdf file can either be tagged or not; however, tags is this context
are not the tags in and html or xml context.
Chapter 13 of the itext book:

http://itextpdf.com/book/chapter.php?id=13

on page 423 explains what a tagged pdf file is.

>
> Please confirm my understanding.
>
> I have tried using PDFReader class which helps me get entire content of
> pdf but I am not able to find out which is the heading , title, table in
> the pdf content. My requirement is to create an XML doc with heading in
> pdf as tags and content in pdf as tag-element contents.
>
> Please let me know how this can be achieved using iText. Its urgent.

I don't know how to do this without a tagged pdf.  With a tagged pdf,
TaggedPdfReaderTool works; however, there's no formatting information
(no style sheet to specify exactly where the xml elements are to appear
on the page); hence, you can see how it would look in a web browser.

I'm trying to figure out how to extract this formatting information
now.

>
> Thanks in advance
>
> Regards,
>
> Neha
>
HTH.
-regards,
Larry




------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to