Tilman Hausherr created PDFBOX-3864:
---------------------------------------
Summary: UTF16 encoded string to PDFDocEncoding
Key: PDFBOX-3864
URL: https://issues.apache.org/jira/browse/PDFBOX-3864
Project: PDFBox
Issue Type: Bug
Components: PDModel
Affects Versions: 2.0.6
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
Fix For: 2.0.7, 3.0.0
>From [~torakiki] in the mailing list:
{quote}
Hi, we came across this case where we are basically cloning outline items
where the original outline title is a UTF16BE encoded text string
containing the value 00A0 (non break space). We later use the string to
assign the title in a new outline item and the A0 is recognised as a € sign.
Here is a simple test:
{code}
COSString victim = COSString
.parseHex("FEFF004300680061007000740065007200A0");
PDOutlineItem node = new PDOutlineItem();
node.setTitle(victim.getString());
{code}
If you look at the node dictionary you'll see that the title value is
Chapter€
{quote}
The cause is that in the initialization of PDFDocEncoding it was forgotten that
there are "holes" in the 0..255 sequence. I'll add that and a test.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]