> At 19:14 +0200 02-06-2004, Ignacio Renuncio wrote: > >BTW, the offending characters are 0x2026 (three dots > character) and 0x2013 > >(typographical dash), they seem to have been > "auto-formatted" MS Word when > >typing the texts. > > I wouldn't be surprised if the > three-dots-character used in Word is not present > in UTF-8, the same may apply for the dash. > They are present in UTF-16 ;-) (see http://www.unicode.org/charts/ ). I use the conversion arrays below to get ride of all the different quotes MS uses.
N.B. \u2026 the three-dots-character is translated into "..." Kind regards, Henk MMatch / MMbase consultancy and implementation T. +31-(0)6-29054903 E. [EMAIL PROTECTED] I. http://www.mmatch.nl <%! public String [] rawString() { String rawString [] = { "Á","Â","À","Ä","Ã","Å", "É","Ê","È","Ë", "Î","Ï","Ì","Í", "Ô","Ö","Ò","Ó","Õ","Ø", "Ü","Ù","Ú","Û", "á","â","à","ä","ã","å", "é","ê","è","ë", "î","ï","ì","í", "ô","ö","ò","ó","õ","ø", "ü","ù","ú","û", "æ","ç","ß","ÿ","©", "£","®",""", "ð","ñ","÷","ý", "þ","×"," ", "§","¢","°", "†","™","€", "'", "‘","‘","‘","‘","‘","‘","‘","& lsquo;","‘","‘","‘","‘","‘", "’","’","’","’","’","’","’","& rsquo;","’","’","’", """,""",""",""",""",""",""",""", """, """,""",""",""",""",""",""","..","... ", "-","-","-","-","-","-","-","-","-","­"}; return rawString; } %><%! public char [] translatedChar() { // Unicode representation of rawString char translatedChar[] = { '\u00c1','\u00c2','\u00c0','\u00c4','\u00c3','\u00c5', '\u00c9','\u00ca','\u00c8','\u00cb', '\u00ce','\u00cf','\u00cc','\u00cd', '\u00d4','\u00d6','\u00d2','\u00d3','\u00d5','\u00d8', '\u00dc','\u00d9','\u00da','\u00db', '\u00e1','\u00e2','\u00e0','\u00e4','\u00e3','\u00e5', '\u00e9','\u00ea','\u00e8','\u00eb', '\u00ee','\u00ef','\u00ec','\u00ed', '\u00f4','\u00f6','\u00f2','\u00f3','\u00f5','\u00f8', '\u00fc','\u00f9','\u00fa','\u00fb', '\u00e6','\u00e7','\u00df','\u00ff','\u00a9', '\u00a3','\u00ae','\u0022', '\u00f0','\u00f1','\u00f7','\u00fd', '\u00fe','\u00d7','\u00a0', '\u00a7','\u00a2','\u00b0', '\u2020','\u2122','\u20AC', '\u02C8', '\u0060','\u02BB','\u02BD','\u02BF','\u02CB','\u02CE','\u02F4','\u0559', '\u055D','\u2035','\u2018','\u201B','\u8216', '\u00B4','\u02B9','\u02BC','\u02CA','\u02CF','\u0374','\u055A','\u055B', '\u2019','\u2032','\u8217', '"','\u0022','\u02BA','\u02DD','\u02F5','\u02F6','\u201C','\u201D','\u20 1E','\u201F', '\u2033','\u2036','\u301D','\u301E','\u301F','\u8220','\u8221','\u2025', '\u2026', '\u002D','\u2010','\u2011','\u2012','\u2013','\u2014','\u2015','\u2212', '\u00AD'}; return translatedChar; } %>
