[ 
https://issues.apache.org/jira/browse/PDFBOX-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657212#comment-16657212
 ] 

ASF subversion and git services commented on PDFBOX-3646:
---------------------------------------------------------

Commit 1844362 from [email protected] in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1844362 ]

PDFBOX-3646, PDFBOX-4345: fix problems with missing text and improper handling 
of special characters, by Kai Keggenhoff:
- Instead of traversing the children of an element with the XPath "*" 
expression, simply iterate the children obtained from Node.getChildNodes(), 
process Text and CDATASection nodes directly and call richContentsToString for 
any elements
- escape "<" and "&" in the text values read from the node values
- added quoting " as " to the attribute values to avoid possible corruption

> Annotations parsed from XFDF containing ampersand characters are not properly 
> imported
> --------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-3646
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3646
>             Project: PDFBox
>          Issue Type: Bug
>          Components: AcroForm, PDModel
>    Affects Versions: 2.0.3, 2.0.4, 2.0.5, 2.0.6
>         Environment: java 1.8.0_112
>            Reporter: Kai Keggenhoff
>            Assignee: Tilman Hausherr
>            Priority: Major
>              Labels: xfdf
>             Fix For: 2.0.13, 3.0.0 PDFBox
>
>         Attachments: MergeTest.java, output1.pdf, output2.pdf, sample.xfdf
>
>
> Annotations containing "&" in their text are displayed incorrectly when 
> parsed unmodified from XFDF (the ampersands are encoded as "&amp;" there) and 
> added to a PDF document.
>  This occurs for both "text comment" and "text box" type annotations.
>  However, if the XFDF is modified by replacing "&amp;" with "&amp;amp;" prior 
> to parsing, the imported annotations are then displayed correctly.
> The attached code produces two pdf files. One is the PDF with the unmodified 
> XFDF imported, two the PDF with the modifed XFDF.
> A XFDF containing both a text box and text comment annotation is embedded in 
> the source and attached as a separated file.
> Update 23.03.2017 : This problem persists in 2.0.5 and we noticed the same 
> corruption of merged annotations occur, if the annotation text contains a "<" 
> (encoded as "lt" entity)
> Update 17.10.2018 : This corruption is caused by 
> FDFAnnotation.richContentsToString. This method reads "<" and "&" from the 
> parsed values in the document and puts them as such into the markup, but 
> these characters must be replaced with their entities.
> I'll add this substitution to my proposed bugfix of 4345, please refer to 
> https://issues.apache.org/jira/projects/PDFBOX/issues/PDFBOX-4345



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to