Ryan Jackson created PDFBOX-5387:
------------------------------------

             Summary: ToUnicodeWriter.writeTo allows btye overflow in bfrange 
operator
                 Key: PDFBOX-5387
                 URL: https://issues.apache.org/jira/browse/PDFBOX-5387
             Project: PDFBox
          Issue Type: Bug
          Components: PDModel
    Affects Versions: 2.0.25
            Reporter: Ryan Jackson


The {{writeTo}} method of {{ToUnicodeWriter}} allows overflow in the low-order 
byte when writing the {{(begin/end)bfrange}} operator.

As far as I can tell it is used only with the {{PDCIDFontType2Embedder}} class. 
I believe the bug exists in both the main trunk and in the 2.x branch. The code 
in question may be found 
[here|[https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/ToUnicodeWriter.java#L133-L136].]

The portion of the PDF specification (version 1.7) that bears upon this code is 
Section 5.9, Example 5.16.

The existing code attempts to limit the range logic to changes less than or 
equal to 255 code points, but it fails to account for at least the following 
situation by allowing this (for example):

[srcCode1 srcCode2 dstString]
03FF 0400 0036

The overflow between srcCode1 and srcCode2 is not allowed by the specification 
and any text extraction will fail. The glyphs themselves render fine so it is 
not immediately obvious there is a problem until one tries to examine the text 
by using the Content Panel or by copy/pasting from Acrobat (Pro) to some other 
document. By contrast the following bfrange operator does allow the text 
extraction to work as intended:

[srcCode1 srcCode2 dstString]
03FE 03FF 0035

Notice that no overflow exists, and as such the requirements of the 
specification are met.

I have put together a proposed solution 
[here|https://github.com/ryanjackson-wf/pdfbox/pull/1] in my fork of the PDFBox 
GH mirror.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to