Tilman Hausherr created PDFBOX-4811:
---------------------------------------

             Summary: Glyphs getting lost when rendering
                 Key: PDFBOX-4811
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4811
             Project: PDFBox
          Issue Type: Bug
          Components: FontBox
    Affects Versions: 2.0.19
            Reporter: Tilman Hausherr
            Assignee: Tilman Hausherr
             Fix For: 2.0.20, 3.0.0 PDFBox


I missed a rendering change (sorry) in the linked PDF.js issue that happened in 
PDFBOX-4810 but it is not a regression, rather a difference in displaying a bad 
input due to having different data.

The CMap has these ranges:
{code:java}
4 begincodespacerange
<00><7f>
<c080><dfbf>
<e08080><efbfbf>
<f0808080><f7bfbfbf>
endcodespacerange
{code}
The content stream has segments like
{code:java}
(Check\340up Date:2020/ 3/ 4  11:46) Tj
{code}
0340 is 0xE0. The current code at CMap.readCode() reads bytes until a range 
fits, and this means it reads 4 bytes until it noticed that this has failed. 
After the failure it doesn't reposition. So this is displayed as "Check -Date" 
instead of "Check -up Date", i.e. input is lost. The "-" is also a default 
glyph.

The solution is to remember the position and to reposition there. I'm using 
mark() and reset() which, surprisingly, works both when loading in memory and 
when loading with temp file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to