[ 
https://issues.apache.org/jira/browse/XALANJ-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819804#comment-17819804
 ] 

Joe Kesselman commented on XALANJ-2730:
---------------------------------------

As discussed in XALANJ-2725, there are still some edge conditions possible even 
after the problem of splitting output across UTF16 buffer boundaries has been 
handled.  I dropped some additional comments into the serializer ToStream class 
to document my concerns.

If an isolated High or Low surrogate somehow gets into the data stream, we are 
inconsistent in how we handle it – it may throw an exception, or it may 
"silently" output the surrogate as a Numeric Character Reference – which will 
not be syntactically or semantically correct per either XML or UTF16, and which 
doesn't warn the user of the problem, but which does attempt to show the 
problem (approximately) in context.

My _preferred_ fix would be to have malformed UTF16 input always throw 
exceptions rather than trying to dance around this to output (unusable) Numeric 
Character References for isolated surrogates, especially since the remaining 
edge conditions are particularly ugly ones. But comments in the code seem to 
suggest that we moved away from that for some reason, and I don't recall 
why/how that was justified.

If we do stay with fake-NCRs for isolated surrogates, I'm seriously considering 
changing them to be fake-entity-references, which will at least not be 
syntactically incorrect; this could be done by replacing the current output, eg 
{{{}�{}}}, with something more like 
{{&ERR_INVALID_UTF16_SURROGATE_55308;}} , using the MsgKey string so we at 
least are in synch with the internationalization layer for clarity.

> Review handling of isolated UTF16 surrogate characters in serializer
> --------------------------------------------------------------------
>
>                 Key: XALANJ-2730
>                 URL: https://issues.apache.org/jira/browse/XALANJ-2730
>             Project: XalanJ2
>          Issue Type: Bug
>      Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>            Reporter: Joe Kesselman
>            Assignee: Gary D. Gregory
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org
For additional commands, e-mail: dev-h...@xalan.apache.org

Reply via email to