[ https://issues.apache.org/jira/browse/XALANJ-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819804#comment-17819804 ]
Joe Kesselman commented on XALANJ-2730: --------------------------------------- As discussed in XALANJ-2725, there are still some edge conditions possible even after the problem of splitting output across UTF16 buffer boundaries has been handled. I dropped some additional comments into the serializer ToStream class to document my concerns. If an isolated High or Low surrogate somehow gets into the data stream, we are inconsistent in how we handle it – it may throw an exception, or it may "silently" output the surrogate as a Numeric Character Reference – which will not be syntactically or semantically correct per either XML or UTF16, and which doesn't warn the user of the problem, but which does attempt to show the problem (approximately) in context. My _preferred_ fix would be to have malformed UTF16 input always throw exceptions rather than trying to dance around this to output (unusable) Numeric Character References for isolated surrogates, especially since the remaining edge conditions are particularly ugly ones. But comments in the code seem to suggest that we moved away from that for some reason, and I don't recall why/how that was justified. If we do stay with fake-NCRs for isolated surrogates, I'm seriously considering changing them to be fake-entity-references, which will at least not be syntactically incorrect; this could be done by replacing the current output, eg {{{}�{}}}, with something more like {{&ERR_INVALID_UTF16_SURROGATE_55308;}} , using the MsgKey string so we at least are in synch with the internationalization layer for clarity. > Review handling of isolated UTF16 surrogate characters in serializer > -------------------------------------------------------------------- > > Key: XALANJ-2730 > URL: https://issues.apache.org/jira/browse/XALANJ-2730 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Reporter: Joe Kesselman > Assignee: Gary D. Gregory > Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org For additional commands, e-mail: dev-h...@xalan.apache.org