[ 
https://issues.apache.org/jira/browse/PDFBOX-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285171#comment-16285171
 ] 

Ladislav Dudáš commented on PDFBOX-4032:
----------------------------------------

Looks like PDF spec is little bit inconsistent in this. On first paragraph


{panel}
A literal string shall be written as an arbitrary number of characters enclosed 
in parentheses. Any characters
may appear in a string except unbalanced parentheses (LEFT PARENHESIS (28h) and 
RIGHT
PARENTHESIS (29h)) and the backslash (REVERSE SOLIDUS (5Ch)), which shall be 
treated specially as
described in this sub-clause. Balanced pairs of parentheses within a string 
require no special treatment.
{panel}

telling that only parenthesis and backslash shall be treated specially. But in 
second paragraph

{panel}
Within a literal string, the REVERSE SOLIDUS is used as an escape character. 
The character immediately
following the REVERSE SOLIDUS determines its precise interpretation as shown in 
Table 3. If the character
following the REVERSE SOLIDUS is not one of those shown in Table 3, the REVERSE 
SOLIDUS shall be
ignored.
{panel}

are important words "precise interpretation".

So it looks both possibility are correct. I'm not sure now if conformant writer 
should escaped HT, BS and FF because that is precise interpretation or not.

> Handle correctly special characters while writing COSString
> -----------------------------------------------------------
>
>                 Key: PDFBOX-4032
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4032
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Writing
>    Affects Versions: 2.0.8
>            Reporter: Ladislav Dudáš
>             Fix For: 2.0.9
>
>         Attachments: Contains_tab_bad.pdf, 
> Contains_tab_bad_offset-corrected-saved_by_adobe.pdf, 
> Contains_tab_bad_offset-corrected.pdf, Contains_tab_ok.pdf
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Regarding to case PDFBOX-3107. There was change in CosWritter.java that if 
> string contains characters CR (0x0d) and LF (0x0a) the string is written in 
> hex format. This may be ok, but PDF specification (7.3.4.2 Literal Strings) 
> explicitly defines more characters that should handle specially.
> I'm providing another version of the code that handles all special characters 
> without transforming to hex format.
> PR [#41|https://github.com/apache/pdfbox/pull/41]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to