[
https://issues.apache.org/jira/browse/LANG-66?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607737#action_12607737
]
weaver edited comment on LANG-66 at 6/24/08 12:37 PM:
--------------------------------------------------------------
The correct implementation for this should be:
1. Escape all known unicode values (already being done)
2. Remove or mask all values OUTSIDE the following allowed values:
Allowed Whitespace: 0x9 0xA 0xD 0x20
Range 1: 0x21 - 0xD7FF
Range 2: 0xE000 - 0xFFFD
Range 3: 0x10000 - 0x10FFFF
Anything not matching the above values that hasn't already been escaped, should
be masked or removed. What I do is write the hex value in place of the actual
character:
Example, the evil 0x13 that gets copied out of MS word all the friggin time
would look something like this:
[Unicode: 0x13]
I feel this is better than completely removing the character or replacing it
with a generic "?" or something like that as it can be debugged much quicker
from a data standpoint.
Reference: XML Specification, section 2.2 http://www.w3.org/TR/REC-xml/#charsets
was (Author: weaver):
The correct implementation for this should be:
1. Escape all known unicode values (already being done)
2. Remove or mask all values OUTSIDE the following allowed values:
Allowed Whitespace: 0x9 0xA 0xD
Range 1: 0x21 - 0xD7FF
Range 2: 0xE000 - 0xFFFD
Range 3: 0x10000 - 0x10FFFF
Anything not matching the above values that hasn't already been escaped, should
be masked or removed. What I do is write the hex value in place of the actual
character:
Example, the evil 0x13 that gets copied out of MS word all the friggin time
would look something like this:
[Unicode: 0x13]
I feel this is better than completely removing the character or replacing it
with a generic "?" or something like that as it can be debugged much quicker
from a data standpoint.
Reference: XML Specification, section 2.2 http://www.w3.org/TR/REC-xml/#charsets
> [lang] StringEscaper.escapeXml() escapes characters > 0x7f
> ----------------------------------------------------------
>
> Key: LANG-66
> URL: https://issues.apache.org/jira/browse/LANG-66
> Project: Commons Lang
> Issue Type: Bug
> Affects Versions: 2.1
> Environment: Operating System: All
> Platform: All
> Reporter: Sandor Vroemisse
> Fix For: 3.0
>
>
> StringEscaper.escapeXml() escapes characters > 0x7f. That's both undesired and
> undocumented.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.