[ 
https://issues.apache.org/jira/browse/XALANJ-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810437#comment-17810437
 ] 

Max edited comment on XALANJ-2725 at 1/24/24 2:56 PM:
------------------------------------------------------

Hi [~kesh...@alum.mit.edu] , all good points. Considering that I am not an 
"expert" in xalan, and have no deep understanding in its full complex 
functionality, my intention was to minimize the potential blast radius that may 
have been caused by my change. Thus I changed as little code as possible 
solving a very narrow and very specific problem. I agree that a broken 
character sequence SHOULD result in an error and not in masking. Will this 
change break backward compatibility? I am not sure, but it is the right course 
of action. As far as using a Character class I just followed suit from other 
areas in the code. As you suggested, we do not really need that class and can 
get away with a numeric value. I do not think 0 is a valid value, so might as 
well serve in place of null. 

Also, I tested my changes live on my use-cases, I did not find the regression 
tests that you are mentioning as failing. Would you mind pointing me in the 
right direction? I'd like to run them and see where they fail, maybe I'll catch 
what I missed.

Thanks!


was (Author: maxfortun):
Hi [~kesh...@alum.mit.edu] , all good points. Considering that I am not an 
"expert" in xalan, and have no deep understanding in its full complex 
functionality, my intention was to minimize the potential blast radius that may 
have been caused by my change. Thus I changed as little code as possible 
solving a very narrow and very specific problem. I agree that a broken 
character sequence SHOULD result in an error and not in masking. Will this 
change break backward compatibility? I am not sure, but it is the right course 
of action. As far as using a Character class I just followed suit from other 
areas in the code. As you suggested, we do not really need that class and can 
get away with a numeric value, as you suggested. I do not think 0 is a valid 
value, so might as well serve in place of null. 

Also, I tested my changes live on my use-cases, I did not find the regression 
tests that you are mentioning as failing. Would you mind pointing me in the 
right direction? I'd like to run them and see where they fail, maybe I'll catch 
what I missed.

Thanks!

> Possible buffer-boundry issue when serializing surrogate pairs
> --------------------------------------------------------------
>
>                 Key: XALANJ-2725
>                 URL: https://issues.apache.org/jira/browse/XALANJ-2725
>             Project: XalanJ2
>          Issue Type: Improvement
>      Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>          Components: Serialization
>            Reporter: Joe Kesselman
>            Assignee: Joe Kesselman
>            Priority: Major
>              Labels: Surrogates, escaping, unicode, utf
>         Attachments: astral-chars-split-buffer.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> XALANJ-2419 addressed a case where "astral" Unicode characters, requiring a 
> surrogate pair (two UTF-16 units), were not being serialized correctly. We 
> have a proposed fix for that.
> There is reported to still be an edge case when a surrogate pair which 
> crosses buffer boundaries might not be handled correctly. [~maxfortun] 
> offered what looks like a reasonable proposed fix 
> (https://github.com/maxfortun/xalan-j/blob/a9bd5591d9f8a523548aeec091e886b64c691628/src/org/apache/xml/serializer/ToStream.java#L1607),
>  but in my testing this was not serializing the surrogate pairs correctly, 
> causing regression on the tests XALANJ-2419 introduced. I don't know whether 
> that's because we're taking multiple paths through
> But the edge case does appear to be real, and if so we will need some such 
> solution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org
For additional commands, e-mail: dev-h...@xalan.apache.org

Reply via email to