[ 
https://issues.apache.org/jira/browse/PDFBOX-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4811:
------------------------------------
    Description: 
I missed a rendering change (sorry) in the linked PDF.js issue that happened in 
PDFBOX-4810 but it is not a regression, rather a difference in displaying a bad 
input due to having different data.

The CMap has these ranges:
{code:java}
4 begincodespacerange
<00><7f>
<c080><dfbf>
<e08080><efbfbf>
<f0808080><f7bfbfbf>
endcodespacerange
{code}
The content stream has segments like
{code:java}
(Check\340up Date:2020/ 3/ 4  11:46) Tj
{code}
0340 is 0xE0. The current code at CMap.readCode() reads bytes until a range 
fits, and this means it reads 4 bytes until it noticed that this has failed. 
After the failure it doesn't reposition. So this is displayed as "Check ·Date" 
instead of "Check \-up Date", i.e. input is lost. The "·" is the default glyph.

The solution is to remember the position and to reposition there. I'm using 
mark() and reset() which, surprisingly, works both when loading in memory and 
when loading with temp file.

  was:
I missed a rendering change (sorry) in the linked PDF.js issue that happened in 
PDFBOX-4810 but it is not a regression, rather a difference in displaying a bad 
input due to having different data.

The CMap has these ranges:
{code:java}
4 begincodespacerange
<00><7f>
<c080><dfbf>
<e08080><efbfbf>
<f0808080><f7bfbfbf>
endcodespacerange
{code}
The content stream has segments like
{code:java}
(Check\340up Date:2020/ 3/ 4  11:46) Tj
{code}
0340 is 0xE0. The current code at CMap.readCode() reads bytes until a range 
fits, and this means it reads 4 bytes until it noticed that this has failed. 
After the failure it doesn't reposition. So this is displayed as "Check \-Date" 
instead of "Check \-up Date", i.e. input is lost. The "-" is also a default 
glyph.

The solution is to remember the position and to reposition there. I'm using 
mark() and reset() which, surprisingly, works both when loading in memory and 
when loading with temp file.


> Glyphs getting lost when rendering
> ----------------------------------
>
>                 Key: PDFBOX-4811
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4811
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 2.0.19
>            Reporter: Tilman Hausherr
>            Assignee: Tilman Hausherr
>            Priority: Major
>             Fix For: 2.0.20, 3.0.0 PDFBox
>
>         Attachments: PDFJS-11768.pdf-1-after.png, PDFJS-11768.pdf-1-before.png
>
>
> I missed a rendering change (sorry) in the linked PDF.js issue that happened 
> in PDFBOX-4810 but it is not a regression, rather a difference in displaying 
> a bad input due to having different data.
> The CMap has these ranges:
> {code:java}
> 4 begincodespacerange
> <00><7f>
> <c080><dfbf>
> <e08080><efbfbf>
> <f0808080><f7bfbfbf>
> endcodespacerange
> {code}
> The content stream has segments like
> {code:java}
> (Check\340up Date:2020/ 3/ 4  11:46) Tj
> {code}
> 0340 is 0xE0. The current code at CMap.readCode() reads bytes until a range 
> fits, and this means it reads 4 bytes until it noticed that this has failed. 
> After the failure it doesn't reposition. So this is displayed as "Check 
> ·Date" instead of "Check \-up Date", i.e. input is lost. The "·" is the 
> default glyph.
> The solution is to remember the position and to reposition there. I'm using 
> mark() and reset() which, surprisingly, works both when loading in memory and 
> when loading with temp file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to