[ 
https://issues.apache.org/jira/browse/XALANJ-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811718#comment-17811718
 ] 

Max commented on XALANJ-2725:
-----------------------------

[~kesh...@alum.mit.edu], what do you think about adding an incomplete surrogate 
pair handling policy? 
Either via a setter method or via a system property? Or chain both? Check if 
class is set via setter, if not set via setter check the system prop, otherwise 
default to UTF16IncompleteSurrogatePairErrorPolicy. Or some name like that?

ToStream already sets one aspect of its behavior via system property here, so 
this is not something new: 
[https://github.com/apache/xalan-java/blob/d83b90e588a5f2499e3eccc7cfcc44708f01494f/serializer/src/main/java/org/apache/xml/serializer/ToStream.java#L111]



This will allow us to add UTF16IncompleteSurrogatePairOutputPolicy, or 
implement a custom one to serialize as the individual use-case would require.

I am worried that if we do not provide a facility for catching errors, 
currently working code will start breaking and upgrades will become a nightmare.

> Possible buffer-boundry issue when serializing surrogate pairs
> --------------------------------------------------------------
>
>                 Key: XALANJ-2725
>                 URL: https://issues.apache.org/jira/browse/XALANJ-2725
>             Project: XalanJ2
>          Issue Type: Improvement
>      Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>          Components: Serialization
>            Reporter: Joe Kesselman
>            Assignee: Joe Kesselman
>            Priority: Major
>              Labels: Surrogates, escaping, unicode, utf
>         Attachments: astral-chars-split-buffer.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> XALANJ-2419 addressed a case where "astral" Unicode characters, requiring a 
> surrogate pair (two UTF-16 units), were not being serialized correctly. We 
> have a proposed fix for that.
> There is reported to still be an edge case when a surrogate pair which 
> crosses buffer boundaries might not be handled correctly. [~maxfortun] 
> offered what looks like a reasonable proposed fix 
> (https://github.com/maxfortun/xalan-j/blob/a9bd5591d9f8a523548aeec091e886b64c691628/src/org/apache/xml/serializer/ToStream.java#L1607),
>  but in my testing this was not serializing the surrogate pairs correctly, 
> causing regression on the tests XALANJ-2419 introduced. I don't know whether 
> that's because we're taking multiple paths through
> But the edge case does appear to be real, and if so we will need some such 
> solution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org
For additional commands, e-mail: dev-h...@xalan.apache.org

Reply via email to