[ 
https://issues.apache.org/jira/browse/PDFBOX-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16587694#comment-16587694
 ] 

Tilman Hausherr commented on PDFBOX-4296:
-----------------------------------------

I remember that we had the problem that we read images twice but I think this 
was fixed with the subsampling change. What comment do you mean re 
transparency? Anyway, that won't apply to you, transparency is relevant when 
rendering the whole PDF. There is no backlog re performance. Can you show a PDF 
where the image extraction is very slow? Just "slower than poppler" doesn't 
mean much, poppler is (AFAIK) in C++, while PDFBox is in Java. We're also 
dependent on the image handling libraries.

You can speed up jpeg extraction by using the images directly, see the 
ExtractImages tool.

> Question: Performance
> ---------------------
>
>                 Key: PDFBOX-4296
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4296
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Rendering
>    Affects Versions: 2.0.11
>            Reporter: Daniel Persson
>            Priority: Trivial
>              Labels: performance
>
> Hi Team.
> We use a tool we built using PDFBox to extract text for about 10k pages per 
> day. Then we have another tool to extract images using Poppler.
> We want to use PDFBox for both tasks but sadly we see a performance hit using 
> PDFBox in the order of 3 times.
> Do you have any backlog / technical dept / ideas on how to improve 
> performance?
> We have tried -Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion=true 
> and that made image generation much slower.
> We have set System.setProperty("sun.java2d.cmm", 
> "sun.java2d.cmm.kcms.KcmsServiceProvider") in code.
> We use image libraries from twelvemonkeys, pdfbox and the standard jai 
> project.
> I've read in the code that we do double writes for images using transparency 
> which might be a culprit.
> I have been allowed to put some time into the project if we have some solid 
> leads or a roadmap to reach better performance.
> Hope it's okay to track this issue here instead of a question on the mailing 
> list.
> Best regards
> Daniel



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to