[ 
https://issues.apache.org/jira/browse/PDFBOX-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16073904#comment-16073904
 ] 

Tilman Hausherr edited comment on PDFBOX-3852 at 7/5/17 4:42 PM:
-----------------------------------------------------------------

{quote}
Great! I'm glad it is going to be applied!
{quote}
No, that's not what I wrote. Your patch looks incorrect, I asked to correct it. 
But ignore that for now. I think I got your ideas.

What I tried to do is:
- understand your text
- reproduce the problem you mention
- understand your patch. Apparently it sets a scratch file for the watermark 
document. And uses a map for the watermark documents.

Re "understand your text", what do you mean with "happens based on jetty 
running time memory setting, and pdf file size", do you mean it depends on 
memory and file size?

Please attach the file watermarked.pdf. I tried with the file mcafee.pdf and it 
worked without the patch, however the result file had a size of 91 MB ! Which 
makes me doubt whether the Overlay class should be used at all this way, i.e. 
shouldn't identical files use the same internal document?

So my proposal is different but uses only one of your ideas (the map), please 
try it in your project, without the other changes:
{code}
    public PDDocument overlay(Map<Integer, String> specificPageOverlayFile)
            throws IOException
    {
        HashMap <String,PDDocument> loadedDocuments = new 
HashMap<String,PDDocument>();
        HashMap <PDDocument,LayoutPage> layouts = new 
HashMap<PDDocument,LayoutPage>();
        loadPDFs();
        for (Map.Entry<Integer, String> e : specificPageOverlayFile.entrySet())
        {
            PDDocument doc = loadedDocuments.get(e.getValue());
            if (doc == null)
            {
                doc = loadPDF(e.getValue());
                loadedDocuments.put(e.getValue(), doc);
                layouts.put(doc,getLayoutPage(doc));
            }
            specificPageOverlay.put(e.getKey(), doc);
            specificPageOverlayPage.put(e.getKey(), layouts.get(doc));
        }
        processPages(inputPDFDocument);
        return inputPDFDocument;
    }
{code}

re patch - it should be against the trunk. But test the code above first.

-I also doubt that your change in createStream helps much-, these are tiny 
streams. And it's not needed to close the scratch files separately.

So IMHO we can also use your scratch file proposal as a setting, but only for 
the input document.


was (Author: tilman):
{quote}
Great! I'm glad it is going to be applied!
{quote}
No, that's not what I wrote. Your patch looks incorrect, I asked to correct it. 
But ignore that for now. I think I got your ideas.

What I tried to do is:
- understand your text
- reproduce the problem you mention
- understand your patch. Apparently it sets a scratch file for the watermark 
document. And uses a map for the watermark documents.

Re "understand your text", what do you mean with "happens based on jetty 
running time memory setting, and pdf file size", do you mean it depends on 
memory and file size?

Please attach the file watermarked.pdf. I tried with the file mcafee.pdf and it 
worked without the patch, however the result file had a size of 91 MB ! Which 
makes me doubt whether the Overlay class should be used at all this way, i.e. 
shouldn't identical files use the same internal document?

So my proposal is different but uses only one of your ideas (the map), please 
try it in your project, without the other changes:
{code}
    public PDDocument overlay(Map<Integer, String> specificPageOverlayFile)
            throws IOException
    {
        HashMap <String,PDDocument> loadedDocuments = new 
HashMap<String,PDDocument>();
        HashMap <PDDocument,LayoutPage> layouts = new 
HashMap<PDDocument,LayoutPage>();
        loadPDFs();
        for (Map.Entry<Integer, String> e : specificPageOverlayFile.entrySet())
        {
            PDDocument doc = loadedDocuments.get(e.getValue());
            if (doc == null)
            {
                doc = loadPDF(e.getValue());
                loadedDocuments.put(e.getValue(), doc);
                layouts.put(doc,getLayoutPage(doc));
            }
            specificPageOverlay.put(e.getKey(), doc);
            specificPageOverlayPage.put(e.getKey(), layouts.get(doc));
        }
        processPages(inputPDFDocument);
        return inputPDFDocument;
    }
{code}

re patch - it should be against the trunk. But test the code above first.

I also doubt that your change in createStream helps much, these are tiny 
streams. And it's not needed to close the scratch files separately.

So IMHO we can also use your scratch file proposal as a setting, but only for 
the input document.

> Overlay a pdf file which is 750 pages ends up in OutOfMemoryError
> -----------------------------------------------------------------
>
>                 Key: PDFBOX-3852
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3852
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 2.0.6
>         Environment: Unbuntu, jetty
>            Reporter: ryuukei
>            Assignee: Tilman Hausherr
>              Labels: Overlay
>         Attachments: 750-pages.pdf, McAfee.pdf, Overlay.patch, watermarked.pdf
>
>
> We found an issue and solution to fix it, you guys might would be interested 
> to have a look and see whether it is worth applying the attached patch to 
> benefit more pdfbox users. :-) And a bit more detail this error happens based 
> on jetty running time memory setting, and pdf file size.
> * Application platform:
> Unbuntu, jetty
> * The test case to produce this issue:
> Add simple overlay to all pages (in this case it is 750 pages). The 
> processPages function eats up the JVM memories while applying the overlay to 
> the file.
> * sample code for using pdfbox overlay:
> {code}
>  PDDocument document = PDDocument.load( pdf );
>  HashMap<Integer, String> overlayGuide = new HashMap();
>  for (int i = 0; i < pagenunber; i++)
>  {
>   // "watermarked.pdf" meat to be a file which contains watermarks on the page
>    overlayGuide.put(i+1, "watermarked.pdf");
>  }
>  Overlay overlay = new Overlay();
>  overlay.setInputPDF( document );
>  overlay.setOverlayPosition( Overlay.Position.FOREGROUND );
>  PDDocument overlayResult = overlay.overlay( overlayGuide );
> {code}
> * Error log:
> {code}
> INFO   | jvm 1    | main    | 2017/07/03 13:06:23 | 
> java.lang.OutOfMemoryError: Java heap space
> STATUS | wrapper  | main    | 2017/07/03 13:06:23 | Filter trigger matched.  
> Restarting JVM.
> INFO   | jvm 1    | main    | 2017/07/03 13:06:23 |   at 
> org.apache.pdfbox.io.ScratchFile.<init>(ScratchFile.java:128)
> INFO   | jvm 1    | main    | 2017/07/03 13:06:23 |   at 
> org.apache.pdfbox.io.ScratchFile.getMainMemoryOnlyInstance(ScratchFile.java:143)
> INFO   | jvm 1    | main    | 2017/07/03 13:06:23 |   at 
> org.apache.pdfbox.cos.COSStream.<init>(COSStream.java:55)
> INFO   | jvm 1    | main    | 2017/07/03 13:06:23 |   at 
> org.apache.pdfbox.multipdf.Overlay.createStream(Overlay.java:***)
> INFO   | jvm 1    | main    | 2017/07/03 13:06:23 |   at 
> org.apache.pdfbox.multipdf.Overlay.processPages(Overlay.java:364)
> INFO   | jvm 1    | main    | 2017/07/03 13:06:23 |   at 
> org.apache.pdfbox.multipdf.Overlay.overlay(Overlay.java:128)
> {code}
> * Solution
> Apply MemoryUsageSetting to Overlay, allows Overlay to use file as temp 
> output.
> * Update for the Overlay usage:
> {code}
>  PDDocument document = PDDocument.load( pdf );
>  HashMap<Integer, String> overlayGuide = new HashMap();
>  for (int i = 0; i < pagenunber; i++)
>  {
>    overlayGuide.put(i+1, "watermarked.pdf");
>  }
>  Overlay overlay = new Overlay();
>  overlay.setInputPDF( document );
>  overlay.setOverlayPosition( Overlay.Position.FOREGROUND );
>  // set overlay to use temp file as out rather than memory
>  MemoryUsageSetting memoryUsageSetting = 
> MemoryUsageSetting.setupTempFileOnly(  );
>  memoryUsageSetting.setTempDir( new File ( "someTempWorkingDir" ) );
>  overlay.setMemoryUsageSetting( memoryUsageSetting );
>  PDDocument overlayResult = overlay.overlay( overlayGuide );
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to