[ 
https://issues.apache.org/jira/browse/PDFBOX-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Kohlschütter updated PDFBOX-1653:
-------------------------------------------

    Attachment: PDFBOX-1653.patch
    
> Fix pdfbox eating up big chunks of memory for identical mappings
> ----------------------------------------------------------------
>
>                 Key: PDFBOX-1653
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1653
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 1.8.1, 1.8.2, 2.0.0
>            Reporter: Christian Kohlschütter
>            Priority: Critical
>              Labels: PatchAvailable
>         Attachments: PDFBOX-1653.patch
>
>
> pdfbox currently handles the PDF beginbfrange command (which creates a 
> character mapping for a range of CIDs to Unicode characters) in a very 
> inefficient way.
> If a PDF document contains a range of CID 0 to CID 65535 with a mapping 
> offset of 0 (which translates to "CID values map 1:1 to Unicode characters", 
> pdfbox would nevertheless map each and every CID.
> There apparently are PDFs with a lot of these 0-65535 mappings, and such a 
> single PDF may cause an OutOfMemoryError.
> This patch detects zero-offset ranges and basically just skips them from an 
> explicit mapping.
> There is some special handling for the space character included in the patch, 
> which might or might not be relevant.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to