Tilman Hausherr created PDFBOX-3864:
---------------------------------------

             Summary: UTF16 encoded string to PDFDocEncoding
                 Key: PDFBOX-3864
                 URL: https://issues.apache.org/jira/browse/PDFBOX-3864
             Project: PDFBox
          Issue Type: Bug
          Components: PDModel
    Affects Versions: 2.0.6
            Reporter: Tilman Hausherr
            Assignee: Tilman Hausherr
             Fix For: 2.0.7, 3.0.0


>From [~torakiki] in the mailing list:
{quote}
Hi, we came across this case where we are basically cloning outline items
where the original outline title is a UTF16BE encoded text string
containing the value 00A0 (non break space). We later use the string to
assign the title in a new outline item and the A0 is recognised as a € sign.
Here is a simple test:
{code}
        COSString victim = COSString
                .parseHex("FEFF004300680061007000740065007200A0");
        PDOutlineItem node = new PDOutlineItem();
        node.setTitle(victim.getString());
{code}
If you look at the node dictionary you'll see that the title value is
Chapter€
{quote}
The cause is that in the initialization of PDFDocEncoding it was forgotten that 
there are "holes" in the 0..255 sequence. I'll add that and a test.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to