RuhongCai created PDFBOX-4337:
---------------------------------

             Summary: Could extract all elements(Text, Image, Table, etc) 
dynamically in sequence from pdf file 
                 Key: PDFBOX-4337
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4337
             Project: PDFBox
          Issue Type: Wish
            Reporter: RuhongCai
         Attachments: sample_pdf.pdf

We are trying to compare two pdf files in run time and detect the "insertion" , 
"deletion", "modification" between two files.

PDFBOx works well for "extract Text for two files", but it is not enough for us,

Does any api in pdfbox or any workaround way to "read/extract" all 
component(Table, image,Text, etc) from pdf files in sequence and return some 
related useful information.

The attached is sample file, you could see, there are Text, Table, image, 
not-well format.  

[^sample_pdf.pdf]

 

Many thanks!

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to