[jira] [Commented] (PDFBOX-4649) High CPU load an memory usage, when converting PDF to Image

Tilman Hausherr (Jira) Wed, 11 Sep 2019 12:03:16 -0700


    [ 
https://issues.apache.org/jira/browse/PDFBOX-4649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927909#comment-16927909
 ]


Tilman Hausherr commented on PDFBOX-4649:
-----------------------------------------

With PDFDebugger I can display these files at 72dpi in 5 seconds with -Xmx4g. 
If I set the CPU to "ridiculous speed" more, it goes down to 2 seconds. 
Additional time will be needed to save the files.

It will be slower with higher dpi. It went up to about 3 seconds at 400% which 
is about 288dpi.

You can increase speed slightly by changing
{code}
PDDocument.load(Files.newInputStream(filePath, StandardOpenOption.READ))
{code}
to
{code}
PDDocument.load(new File(filePath))
{code}


> High CPU load an memory usage, when converting PDF to Image
> -----------------------------------------------------------
>
>                 Key: PDFBOX-4649
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4649
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering
>    Affects Versions: 2.0.16
>            Reporter: Willie Chieukam
>            Priority: Critical
>         Attachments: 331577-5_b_19ez1.pdf, 332699-5_c_19ez7.pdf, 
> 335520-5_c_19ezb.pdf, 335521-5_c_19ezd.pdf
>
>
> Hello!
> we are running a business web application, that is using pdfbox to convert
>  pdf-files to images using using pdfRenderer.renderImageWithDPI(parameters).
> When we try to convert the attached pdf, the CPU load of tomcat, running in a 
> docker container on openshift, is raising and it seems, that the process 
> hangs. The tomcat process is no more responsive and we get an memory 
> overflow. Also the server load is very high meanwhile.
> We are using
> + org.apache.pdfbox:pdfbox v 2.0.16
>  + org.apache.pdfbox:pdfbox-tools v 2.0.16
>  + org.apache.pdfbox:jbig2-imageio:3.0.2
> Our Code looks like this:
> {code:java}
>     public void saveImageFromPDF(Path filePath, Path imagePath, Integer 
> IMAGE_DPI, Float IMAGE_QUALITY) {
>         try (PDDocument pddocument = 
> PDDocument.load(Files.newInputStream(filePath, StandardOpenOption.READ))) {
>             PDFRenderer pdfRenderer = new PDFRenderer(pddocument);
>             for (Integer i = 0; i < pddocument.getNumberOfPages(); i++) {
>                 try (OutputStream outputStream = documentServiceUtility
>                         
> .getFileOutputStream(imagePath.resolve(Integer.toString(i) + "." + 
> IMAGE_FILE_EXTENSION))) {
>                     BufferedImage bufferedImage = 
> pdfRenderer.renderImageWithDPI(i, IMAGE_DPI, ImageType.BINARY);
>                     ImageIOUtil.writeImage(bufferedImage, 
> IMAGE_FILE_EXTENSION, outputStream, IMAGE_DPI, IMAGE_QUALITY);
>                     LOG.debug("Image of document {} successfully saved.",
>                             imagePath.resolve(Integer.toString(i) + "." + 
> IMAGE_FILE_EXTENSION));
>                 } catch (Throwable ex) {
>                     throw new NiehoffPDDocumentHanderException(filePath, ex);
>                 }
>             }
>         } catch (Exception e) {
>             throw new NiehoffPDDocumentHanderException(filePath, e);
>         }
>     }
> {code}
> Line throwing the exception
> *{color:#FF0000}BufferedImage bufferedImage = 
> pdfRenderer.renderImageWithDPI(i, IMAGE_DPI, ImageType.BINARY);{color}*
>   
>  Do you have an idea, how to prevent this?
> Thank you very much and best regards,
>  Willie



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4649) High CPU load an memory usage, when converting PDF to Image

Reply via email to