[ https://issues.apache.org/jira/browse/PDFBOX-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler resolved PDFBOX-2212. ---------------------------------------- Resolution: Fixed Fix Version/s: 2.0.0 1.8.7 [~vandersons] Thanks for the fast response. Set to resolved > OutOfMemoryError in GlyfCompositeDescrip > ---------------------------------------- > > Key: PDFBOX-2212 > URL: https://issues.apache.org/jira/browse/PDFBOX-2212 > Project: PDFBox > Issue Type: Bug > Components: FontBox, Preflight > Affects Versions: 1.8.6 > Environment: Windows 7, JDK6 > Reporter: Valdis Andersons > Assignee: Andreas Lehmkühler > Fix For: 1.8.7, 2.0.0 > > Attachments: adobe_error1.jpg, adobe_error2.jpg > > > Hi All, > > The application I’m working on is a web service that accepts PDF documents > and combines them in a single larger PDF. Client submits a bunch of PDFs and > we create a single PDF out of them. In some rare cases one of the PDF > documents submitted has a glitch in it that causes Adobe Reader to throw > errors when viewing the final document (attached). > When I tried to check the buggy PDF with the approach outlined here: > > https://pdfbox.apache.org/cookbook/pdfavalidation.html > > I was getting an OutOfMemoryError in the GlyfCompositeDescrip class, here is > the full stack trace: > > java.lang.OutOfMemoryError: Java heap space > at > org.apache.fontbox.ttf.GlyfCompositeDescript.<init>(GlyfCompositeDescript.java:58) > at > org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:62) > at > org.apache.fontbox.ttf.GlyphTable.initData(GlyphTable.java:69) > at > org.apache.fontbox.ttf.TrueTypeFont.initializeTable(TrueTypeFont.java:280) > at > org.apache.fontbox.ttf.AbstractTTFParser.parseTables(AbstractTTFParser.java:128) > at > org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:80) > at > org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:109) > at > org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:25) > at > org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:84) > at > org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:25) > at > org.apache.pdfbox.preflight.font.descriptor.TrueTypeDescriptorHelper.processFontFile(TrueTypeDescriptorHelper.java:84) > at > org.apache.pdfbox.preflight.font.descriptor.FontDescriptorHelper.validate(FontDescriptorHelper.java:97) > at > org.apache.pdfbox.preflight.font.SimpleFontValidator.processFontDescriptorValidation(SimpleFontValidator.java:82) > at > org.apache.pdfbox.preflight.font.SimpleFontValidator.validate(SimpleFontValidator.java:55) > at > org.apache.pdfbox.preflight.process.reflect.FontValidationProcess.validate(FontValidationProcess.java:69) > at > org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73) > at > org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52) > at > org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateFonts(ResourcesValidationProcess.java:96) > at > org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:74) > at > org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73) > at > org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52) > at > org.apache.pdfbox.preflight.xobject.XObjFormValidator.validateXObjectResources(XObjFormValidator.java:178) > at > org.apache.pdfbox.preflight.xobject.XObjFormValidator.validate(XObjFormValidator.java:75) > at > org.apache.pdfbox.preflight.process.reflect.GraphicObjectPageValidationProcess.validate(GraphicObjectPageValidationProcess.java:77) > at > org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73) > at > org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52) > at > org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateXObjects(ResourcesValidationProcess.java:191) > at > org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:78) > at > org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73) > at > org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52) > at > org.apache.pdfbox.preflight.xobject.XObjFormValidator.validateXObjectResources(XObjFormValidator.java:178) > at > org.apache.pdfbox.preflight.xobject.XObjFormValidator.validate(XObjFormValidator.java:75) > > While I can’t send on the PDF in question due to the sensitivity of the > contents in it, I did a bit of digging and debugging to find out why this is > happening. > In the GlyfCompositeDescrip classes constructor there is a do … while loop > that is constructing GlyfCompositeComp objects and adding them to the > components list of GlyfCompositeDescrip. In the constructor of the > GlyfCompositeComp a signed short is read from the TTFDataStream in the flags > field, that field in turn is used in the GlyfCompositeDescrip constructor to > check if any more components are there to be read. Here is the code in > question: > > public GlyfCompositeDescript(TTFDataStream bais, GlyphTable glyphTable) > throws IOException > { > … > do > { > comp = new GlyfCompositeComp(bais); //This is where the > OutOfMemoryError happens > components.add(comp); > } while ((comp.getFlags() & GlyfCompositeComp.MORE_COMPONENTS) != 0); > //here the flags are used to check if more components are there > … > } > > protected GlyfCompositeComp(TTFDataStream bais) throws IOException > { > flags = bais.readSignedShort(); > … > } > > In the case of the corrupted PDF, that we get from time to time, the > bais.readSignedShort() call in GlyfCompositeComp results in a value of -1 and > once it hits that value the condition in the GlyfCompositeDescript > constructor’s loop will always result in 32 (!=0). Basically, it ends up in > an infinite loop and keeps constructing GlyfCompositeComp objects until the > memory runs out. > > The main question here is, has anyone ever encountered a PDF corruption that > causes this behaviour and how would one have to go about checking the PDF > document for this sort of corruptions without causing the application to run > out of memory? > > We’re not required to fix the document, just check if it’s valid. If it’s not > valid then we just reject the document. Ideally I’d also like to know what > the corruption could be so that I can at least give a hint to the client > software as to what is causing this document to be rejected (I do understand > that without the actual PDF that’s causing this it might be impossible to > tell that). -- This message was sent by Atlassian JIRA (v6.2#6252)