Am 18.09.23 um 14:00 schrieb Arno Dietsche:
Hi,
We are using pdfbox 3.0.0 as part of our project which aims at finding
discrepancies between two similar documents created by external services. One
thing we use it for is to render the pages of those documents to images and
compare the rendered images. Those documents can be very large and therfore we
are trying to optimize our resource usage. So we want to parallize the page
rendering if possible. This leads to my question in relation to the PDFRenderer
class (v3.0.0):
PDFBox is officially supposed not to be thread safe, but we removed some
of the limitations and tried to make new features thread safe.
In the past we could observe problems with this multithreaded approach. And I
understand that PDDocument is not thread safe, but what if I get all the PDPage
objects first and then render them multithreaded? Essentially if the method
PDFRenderer.renderImage(int pageIndex, float scale, ImageType imageType,
RenderDestination destination) is passed the PDPPage object directly and not
the pageIndex, it would not be needed to get the PDpage object from the
PDPageTree. Do you know of possible limitations regarding multithreading the
remainder of this renderImage method?
I guess that adding the PDPage instance to that method won't change that
much as 3.0.0 uses an ondemand parser and most likely the related PDPage
objects are't fully loaded so that the parser has to dereference most of
the objects in question during rendering. But good news is, that part
should be thread safe.
Our own debugger is multithreaded and at the beginning of the
implemtation of the ondemand parser I stumbled upon that and had to make
the new IO classes thread safe.
Saying that, I'd like to encourage you to give it a try, but no
guarantee from our side ;-)
Andreas
To clarify I am currently testing this with a subclass of PDFRenderer so I
could add this method: renderImage(PDPage page, float scale, ImageType
imageType, RenderDestination destination)
Thank you very much for your time and help
Best regards / Mit freundlichen Grüßen
Arno Dietsche
brainsphere informationworks GmbH
Elsenheimerstrasse 41
80687 Muenchen
Germany
Telefon: +49 89 203004-830
Telefax: +49 89 203004-849
Sitz der Gesellschaft: Muenchen
Registergericht: Amtsgericht Muenchen HRB 154535
Geschaeftsfuehrer: Hans-Joerg Kamm, Volker Mattes
Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese E-Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser E-Mail ist nicht gestattet.
This e-mail may contain confidential and/or privileged information. If you are
not the intended recipient (or have received this e-mail in error) please
notify the sender immediately and destroy this e-mail. Any unauthorized
copying, disclosure or distribution of the material in this e-mail is strictly
forbidden.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]