Christian Kohlschütter created PDFBOX-1653:
----------------------------------------------

             Summary: Fix pdfbox eating up big chunks of memory for identical 
mappings
                 Key: PDFBOX-1653
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1653
             Project: PDFBox
          Issue Type: Bug
          Components: FontBox
    Affects Versions: 1.8.2, 1.8.1, 2.0.0
            Reporter: Christian Kohlschütter
            Priority: Critical
         Attachments: PDFBOX-1653.patch

pdfbox currently handles the PDF beginbfrange command (which creates a 
character mapping for a range of CIDs to Unicode characters) in a very 
inefficient way.

If a PDF document contains a range of CID 0 to CID 65535 with a mapping offset 
of 0 (which translates to "CID values map 1:1 to Unicode characters", pdfbox 
would nevertheless map each and every CID.

There apparently are PDFs with a lot of these 0-65535 mappings, and such a 
single PDF may cause an OutOfMemoryError.

This patch detects zero-offset ranges and basically just skips them from an 
explicit mapping.
There is some special handling for the space character included in the patch, 
which might or might not be relevant.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to