[ 
https://issues.apache.org/jira/browse/PDFBOX-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4032:
------------------------------------
    Attachment: Contains_tab_bad_offset-corrected-saved_by_adobe.pdf

{quote}
But still by PDF reference conformant creator should replace LF, CR, HT, BS and 
FF control codes with escaped version, octal form or hexadecimal strings.
{quote}
No, the PDF specification only tells that these escapes are understood. The 
only requirement is this:
{quote}
A literal string shall be written as an arbitrary number of characters enclosed 
in parentheses. Any characters may appear in a string except unbalanced 
parentheses (LEFT PARENHESIS (28h) and RIGHT PARENTHESIS (29h)) and the 
backslash (REVERSE SOLIDUS (5Ch)), which shall be treated specially as 
described in this sub-clause. Balanced pairs of parentheses within a string 
require no special treatment.
{quote}
To prove this, I took the file and saved it with Adobe Reader. This means that 
all structures are saved from whatever internal representation there is. And 
you'll see that the hex 9 is there without escape.

That's why I asked whether you have reported the problem to Nitro, and whether 
you've tested the corrected file.

> Handle correctly special characters while writing COSString
> -----------------------------------------------------------
>
>                 Key: PDFBOX-4032
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4032
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Writing
>    Affects Versions: 2.0.8
>            Reporter: Ladislav Dudáš
>             Fix For: 2.0.9
>
>         Attachments: Contains_tab_bad.pdf, 
> Contains_tab_bad_offset-corrected-saved_by_adobe.pdf, 
> Contains_tab_bad_offset-corrected.pdf, Contains_tab_ok.pdf
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Regarding to case PDFBOX-3107. There was change in CosWritter.java that if 
> string contains characters CR (0x0d) and LF (0x0a) the string is written in 
> hex format. This may be ok, but PDF specification (7.3.4.2 Literal Strings) 
> explicitly defines more characters that should handle specially.
> I'm providing another version of the code that handles all special characters 
> without transforming to hex format.
> PR [#41|https://github.com/apache/pdfbox/pull/41]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to