RuhongCai created PDFBOX-4337:
---------------------------------
Summary: Could extract all elements(Text, Image, Table, etc)
dynamically in sequence from pdf file
Key: PDFBOX-4337
URL: https://issues.apache.org/jira/browse/PDFBOX-4337
Project: PDFBox
Issue Type: Wish
Reporter: RuhongCai
Attachments: sample_pdf.pdf
We are trying to compare two pdf files in run time and detect the "insertion" ,
"deletion", "modification" between two files.
PDFBOx works well for "extract Text for two files", but it is not enough for us,
Does any api in pdfbox or any workaround way to "read/extract" all
component(Table, image,Text, etc) from pdf files in sequence and return some
related useful information.
The attached is sample file, you could see, there are Text, Table, image,
not-well format.
[^sample_pdf.pdf]
Many thanks!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]