[ 
https://issues.apache.org/jira/browse/PDFBOX-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14063640#comment-14063640
 ] 

Tilman Hausherr commented on PDFBOX-2212:
-----------------------------------------

This code in MemoryTTFDataStream looks suspicious to me:
{code}
    public int read() throws IOException
    {
        int retval = -1;
        if( currentPosition < data.length )
        {
            retval = data[currentPosition];
        }
        currentPosition++;
        return (retval+256)%256;
    }
{code}
it will return 255 and not -1 on EOF. Because of that, this method:
{code}
    public int readUnsignedShort() throws IOException
    {
        int ch1 = this.read();
        int ch2 = this.read();
        if ((ch1 | ch2) < 0)
        {
            throw new EOFException();
        }
        return (ch1 << 8) + (ch2 << 0);
    }
{code}
won't throw an EOF. Try building from the sources and make this change in 
MemoryTTFDataStream:
{code}
    public int read() throws IOException
    {
        if (currentPosition >= data.length)
        {
            return -1;
        }
        int retval = data[currentPosition];
        currentPosition++;
        return (retval+256)%256;
    }
{code}
This is just a theory, I can't test it myself, I might be wrong, so you should 
test it yourself by changing the code on your system and then testing that 
file. Obviously I'd need the file to be sure. And no, we didn't have this 
effect yet. We did have a similar effect a year ago that had the same cause 
(EOF) but that was fixed (PDFBOX-1668).

> OutOfMemoryError in GlyfCompositeDescrip
> ----------------------------------------
>
>                 Key: PDFBOX-2212
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2212
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox, Preflight
>    Affects Versions: 1.8.6
>         Environment: Windows 7, JDK6
>            Reporter: Valdis Andersons
>         Attachments: adobe_error1.jpg, adobe_error2.jpg
>
>
> Hi All,
>  
> The application I’m working on is a web service that accepts PDF documents 
> and combines them in a single larger PDF. Client submits a bunch of PDFs and 
> we create a single PDF out of them. In some rare cases one of the PDF 
> documents submitted has a glitch in it that causes Adobe Reader to throw 
> errors when viewing the final document (attached).
> When I tried to check the buggy PDF with the approach outlined here:
>  
> https://pdfbox.apache.org/cookbook/pdfavalidation.html
>  
> I was getting an OutOfMemoryError in the GlyfCompositeDescrip class, here is 
> the full stack trace:
>  
> java.lang.OutOfMemoryError: Java heap space
>                 at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.<init>(GlyfCompositeDescript.java:58)
>                 at 
> org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:62)
>                 at 
> org.apache.fontbox.ttf.GlyphTable.initData(GlyphTable.java:69)
>                 at 
> org.apache.fontbox.ttf.TrueTypeFont.initializeTable(TrueTypeFont.java:280)
>                 at 
> org.apache.fontbox.ttf.AbstractTTFParser.parseTables(AbstractTTFParser.java:128)
>                 at 
> org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:80)
>                 at 
> org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:109)
>                 at 
> org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:25)
>                 at 
> org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:84)
>                 at 
> org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:25)
>                 at 
> org.apache.pdfbox.preflight.font.descriptor.TrueTypeDescriptorHelper.processFontFile(TrueTypeDescriptorHelper.java:84)
>                 at 
> org.apache.pdfbox.preflight.font.descriptor.FontDescriptorHelper.validate(FontDescriptorHelper.java:97)
>                 at 
> org.apache.pdfbox.preflight.font.SimpleFontValidator.processFontDescriptorValidation(SimpleFontValidator.java:82)
>                 at 
> org.apache.pdfbox.preflight.font.SimpleFontValidator.validate(SimpleFontValidator.java:55)
>                 at 
> org.apache.pdfbox.preflight.process.reflect.FontValidationProcess.validate(FontValidationProcess.java:69)
>                 at 
> org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73)
>                 at 
> org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52)
>                 at 
> org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateFonts(ResourcesValidationProcess.java:96)
>                 at 
> org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:74)
>                 at 
> org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73)
>                 at 
> org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52)
>                 at 
> org.apache.pdfbox.preflight.xobject.XObjFormValidator.validateXObjectResources(XObjFormValidator.java:178)
>                 at 
> org.apache.pdfbox.preflight.xobject.XObjFormValidator.validate(XObjFormValidator.java:75)
>                 at 
> org.apache.pdfbox.preflight.process.reflect.GraphicObjectPageValidationProcess.validate(GraphicObjectPageValidationProcess.java:77)
>                 at 
> org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73)
>                 at 
> org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52)
>                 at 
> org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateXObjects(ResourcesValidationProcess.java:191)
>                 at 
> org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:78)
>                 at 
> org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73)
>                 at 
> org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52)
>                 at 
> org.apache.pdfbox.preflight.xobject.XObjFormValidator.validateXObjectResources(XObjFormValidator.java:178)
>                 at 
> org.apache.pdfbox.preflight.xobject.XObjFormValidator.validate(XObjFormValidator.java:75)
>  
> While I can’t send on the PDF in question due to the sensitivity of the 
> contents in it, I did a bit of digging and debugging to find out why this is 
> happening.
> In the GlyfCompositeDescrip classes constructor there is a do … while loop 
> that is constructing GlyfCompositeComp objects and adding them to the 
> components list of GlyfCompositeDescrip. In the constructor of the 
> GlyfCompositeComp a signed short is read from the TTFDataStream in the flags 
> field, that field in turn is used in the GlyfCompositeDescrip constructor to 
> check if any more components are there to be read. Here is the code in 
> question:
>  
> public GlyfCompositeDescript(TTFDataStream bais, GlyphTable glyphTable) 
> throws IOException
>     {
> …
>         do
>         {
>             comp = new GlyfCompositeComp(bais); //This is where the 
> OutOfMemoryError happens
>             components.add(comp);
>         } while ((comp.getFlags() & GlyfCompositeComp.MORE_COMPONENTS) != 0); 
> //here the flags are used to check if more components are there
> …
>     }
>  
> protected GlyfCompositeComp(TTFDataStream bais) throws IOException
>     {
>         flags = bais.readSignedShort();
> …
> }
>  
> In the case of the corrupted PDF, that we get from time to time, the 
> bais.readSignedShort() call in GlyfCompositeComp results in a value of -1 and 
> once it hits that value the condition in the GlyfCompositeDescript 
> constructor’s loop will always result in 32 (!=0). Basically, it ends up in 
> an infinite loop and keeps constructing GlyfCompositeComp objects until the 
> memory runs out.
>  
> The main question here is, has anyone ever encountered a PDF corruption that 
> causes this behaviour and how would one have to go about checking the PDF 
> document for this sort of corruptions without causing the application to run 
> out of memory?
>  
> We’re not required to fix the document, just check if it’s valid. If it’s not 
> valid then we just reject the document. Ideally I’d also like to know what 
> the corruption could be so that I can at least give a hint to the client 
> software as to what is causing this document to be rejected (I do understand 
> that without the actual PDF that’s causing this it might be impossible to 
> tell that).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to