[
https://issues.apache.org/jira/browse/PDFBOX-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-4032:
------------------------------------
Attachment: Contains_tab_bad_offset-corrected-saved_by_adobe.pdf
{quote}
But still by PDF reference conformant creator should replace LF, CR, HT, BS and
FF control codes with escaped version, octal form or hexadecimal strings.
{quote}
No, the PDF specification only tells that these escapes are understood. The
only requirement is this:
{quote}
A literal string shall be written as an arbitrary number of characters enclosed
in parentheses. Any characters may appear in a string except unbalanced
parentheses (LEFT PARENHESIS (28h) and RIGHT PARENTHESIS (29h)) and the
backslash (REVERSE SOLIDUS (5Ch)), which shall be treated specially as
described in this sub-clause. Balanced pairs of parentheses within a string
require no special treatment.
{quote}
To prove this, I took the file and saved it with Adobe Reader. This means that
all structures are saved from whatever internal representation there is. And
you'll see that the hex 9 is there without escape.
That's why I asked whether you have reported the problem to Nitro, and whether
you've tested the corrected file.
> Handle correctly special characters while writing COSString
> -----------------------------------------------------------
>
> Key: PDFBOX-4032
> URL: https://issues.apache.org/jira/browse/PDFBOX-4032
> Project: PDFBox
> Issue Type: Improvement
> Components: Writing
> Affects Versions: 2.0.8
> Reporter: Ladislav Dudáš
> Fix For: 2.0.9
>
> Attachments: Contains_tab_bad.pdf,
> Contains_tab_bad_offset-corrected-saved_by_adobe.pdf,
> Contains_tab_bad_offset-corrected.pdf, Contains_tab_ok.pdf
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Regarding to case PDFBOX-3107. There was change in CosWritter.java that if
> string contains characters CR (0x0d) and LF (0x0a) the string is written in
> hex format. This may be ok, but PDF specification (7.3.4.2 Literal Strings)
> explicitly defines more characters that should handle specially.
> I'm providing another version of the code that handles all special characters
> without transforming to hex format.
> PR [#41|https://github.com/apache/pdfbox/pull/41]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]