Re: inline images – EI operator

Tilman Hausherr Wed, 22 Apr 2015 09:56:53 -0700

Hi Lukas,

Thanks for your detailed analysis. It's my fault. (Seehttps://issues.apache.org/jira/browse/PDFBOX-1794 ). I think that the2nd solution you suggested is the better one. I've openedhttps://issues.apache.org/jira/browse/PDFBOX-2772 and will work on thissoon.


Tilman



Am 22.04.2015 um 17:26 schrieb Lukas Schober:

Dear pdfbox-devs,
a co-worker and i are currently developing a service for searching andreplacing content in pdf documents based on pdfbox. We started ourproject with the 1.8.2 version of pdfbox and just trying to migratedto 1.8.8 recently.
On changing to version 1.8.8 we are running into troubles with pdfcontent concerning inline images. Our code study of the differencesbetween those versions of pdfbox led us to the handling of the EIoperator as reason of our troubles.
In version 1.8.2 the method parseNextToken() of theorg.apache.pdfbox.pdfparser.PDFStreamParser does an unread of the EItoken on inline images. In newer versions this unread of the EI tokendoesn't exist anymore with the following comment “// the EI operatorisn't unread, as it won't be processed anyway”.
As a consequence the token sets of a document containing an inlineimage delivered by the PDFStreamParser can't be used to (re)render avalid pdf document by the ContentStreamWriter.The reason is the missing token for the EI operator. Maybe, that theEI token doesn't trigger any further processing, but it is stillnecessary to represent the delimiter in the token sequence.
On the other side if a inline image should be part of a pdf page andis inserted as a token set manually, the EI token must also be presentin the token set, so that the ContentStreamWriter is able to create acorrect pdf document.
From our point of view there are two simple approaches to get a moreconsistent internal representation of pdf documents with pdfboxconcerning inline images. Either represent the EI operator as a token(revert to handling in version 1.8.2.) explicitly or extend thewriteObject method in the ContentStreamWriter to append the EIoperator implicitly.
Furthermore in our specialization of the PDFTextStripper, the abilityto access the base-class properties from there was a limiting factor.Are there some reasons that the properties
org.apache.pdfbox.util.PDFTextStripper::startBookmarkPageNumber
org.apache.pdfbox.util.PDFTextStripper::endBookmarkPageNumber
org.apache.pdfbox.util.PDFTextStripper::pageArticles
org.apache.pdfbox.util.PDFTextStripper::characterListMapping
org.apache.pdfbox.util.PDFStreamEngine::streamResourcesStack
org.apache.pdfbox.util.PDFStreamEngine::page
are really necessary to be private, or is it enough restrictive to beprotected so that they can be accessed in derived classes?
Best regards,
Lukas Schober


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: inline images – EI operator

Reply via email to