On 5 Sep 2005, at 00:33, Antonio Gallardo wrote:
[EMAIL PROTECTED] wrote:Why not only left the original char as it was before your first change? It was working. Having a UTF-8 IMO is not good.Author: pier Date: Sun Sep 4 16:29:09 2005 New Revision: 278641 URL: http://svn.apache.org/viewcvs?rev=278641&view=rev Log: Fixing wrong encoding bug Modified:cocoon/branches/BRANCH_2_1_X/src/blocks/xsp/java/org/apache/ cocoon/components/language/markup/xsp/XSPExpressionParser.java@@ -211,7 +211,7 @@ parser.setState(EXPRESSION_CHAR_STATE); break; - case '�': + case '\u00B4': parser.append(ch); parser.setState(EXPRESSION_SHELL_STATE); break; @@ -235,10 +235,10 @@protected static final State EXPRESSION_CHAR_STATE = new QuotedState('\'');/**- * The parser has encountered '�' in <code>[EMAIL PROTECTED] EXPRESSION_STATE}</code>- * to start a Python string constant.+ * The parser has encountered '\u00B4' (Unicode Latin-1 Acute Accent) in + * <code>[EMAIL PROTECTED] EXPRESSION_STATE}</code> to start a Python string constant.*/- protected static final State EXPRESSION_SHELL_STATE = new QuotedState('�'); + protected static final State EXPRESSION_SHELL_STATE = new QuotedState('\u00B4');
It's not a UTF-8 character, it's an UNICODE character: \u doesn't mean "UTF" but rather "UNICODE" (which is not an encoding).
Depending on your platform encoding (yours apparently ISO8859-1, mine UTF-8, my wife's -she's japanese- Shift-JIS) that sequence (B4) of BYTES as in the original source code will be interpreted as a different character.
Changing the binary sequence B4 to \u00B4 instructs the JVM that no matter what encoding your platform is set to, the resulting character will always (always) be UNICODE 00B4, the Acute Accent, part of the Latin-1 (0X0080) table.
Let's call it defensive programming, and actually, in the source code, we should be using only characters in the range 00-7F (Unicode BASIC-Latin, encoding US-ASCII), as that's the "most-common" amongst all different encodings (even if when thinking about IBM's EBCDIC, even that one might have some problems in some cases).
Pier
smime.p7s
Description: S/MIME cryptographic signature
