> At 19:14 +0200 02-06-2004, Ignacio Renuncio wrote:
> >BTW, the offending characters are 0x2026 (three dots 
> character) and 0x2013
> >(typographical dash), they seem to have been 
> "auto-formatted" MS Word when
> >typing the texts.
> 
> I wouldn't be surprised if the 
> three-dots-character used in Word is not present 
> in UTF-8, the same may apply for the dash.
>
 
They are present in UTF-16 ;-) (see http://www.unicode.org/charts/ ). I
use the conversion arrays below to get ride of all the different quotes
MS uses.

N.B. \u2026 the three-dots-character is translated into "..."

Kind regards, Henk

MMatch / MMbase consultancy and implementation
T. +31-(0)6-29054903
E. [EMAIL PROTECTED]
I. http://www.mmatch.nl

<%! public String [] rawString()
{   String rawString [] = {
        "&Aacute;","&Acirc;","&Agrave;","&Auml;","&Atilde;","&Aring;",
        "&Eacute;","&Ecirc;","&Egrave;","&Euml;",
        "&Icirc;","&Iuml;","&Igrave;","&Iacute;",
        "&Ocirc;","&Ouml;","&Ograve;","&Oacute;","&Otilde;","&Oslash;",
        "&Uuml;","&Ugrave;","&Uacute;","&Ucirc;",
        "&aacute;","&acirc;","&agrave;","&auml;","&atilde;","&aring;",
        "&eacute;","&ecirc;","&egrave;","&euml;",
        "&icirc;","&iuml;","&igrave;","&iacute;",
        "&ocirc;","&ouml;","&ograve;","&oacute;","&otilde;","&oslash;",
        "&uuml;","&ugrave;","&uacute;","&ucirc;",
        "&aelig;","&ccedil;","&szlig;","&yuml;","&copy;",
        "&pound;","&reg;","&quot;",
        "&eth;","&ntilde;","&divide;","&yacute;",
        "&thorn;","&times;","&nbsp;",
        "&sect;","&cent;","&deg;",
        "&dagger;","&trade;","&euro;",
        "'",
 
"&lsquo;","&lsquo;","&lsquo;","&lsquo;","&lsquo;","&lsquo;","&lsquo;","&
lsquo;","&lsquo;","&lsquo;","&lsquo;","&lsquo;","&lsquo;",
 
"&rsquo;","&rsquo;","&rsquo;","&rsquo;","&rsquo;","&rsquo;","&rsquo;","&
rsquo;","&rsquo;","&rsquo;","&rsquo;",
 
"&quot;","&quot;","&quot;","&quot;","&quot;","&quot;","&quot;","&quot;",
"&quot;",
 
"&quot;","&quot;","&quot;","&quot;","&quot;","&quot;","&quot;","..","...
",
        "-","-","-","-","-","-","-","-","-","&shy;"};
    return rawString;
}
%><%! public char [] translatedChar()
{   // Unicode representation of rawString
    char translatedChar[] = {
        '\u00c1','\u00c2','\u00c0','\u00c4','\u00c3','\u00c5',
        '\u00c9','\u00ca','\u00c8','\u00cb',
        '\u00ce','\u00cf','\u00cc','\u00cd',
        '\u00d4','\u00d6','\u00d2','\u00d3','\u00d5','\u00d8',
        '\u00dc','\u00d9','\u00da','\u00db',        
        '\u00e1','\u00e2','\u00e0','\u00e4','\u00e3','\u00e5',
        '\u00e9','\u00ea','\u00e8','\u00eb',
        '\u00ee','\u00ef','\u00ec','\u00ed',
        '\u00f4','\u00f6','\u00f2','\u00f3','\u00f5','\u00f8',
        '\u00fc','\u00f9','\u00fa','\u00fb',
        '\u00e6','\u00e7','\u00df','\u00ff','\u00a9',
        '\u00a3','\u00ae','\u0022',
        '\u00f0','\u00f1','\u00f7','\u00fd',
        '\u00fe','\u00d7','\u00a0',
        '\u00a7','\u00a2','\u00b0',
        '\u2020','\u2122','\u20AC',
        '\u02C8',
 
'\u0060','\u02BB','\u02BD','\u02BF','\u02CB','\u02CE','\u02F4','\u0559',
'\u055D','\u2035','\u2018','\u201B','\u8216',
 
'\u00B4','\u02B9','\u02BC','\u02CA','\u02CF','\u0374','\u055A','\u055B',
'\u2019','\u2032','\u8217',
 
'"','\u0022','\u02BA','\u02DD','\u02F5','\u02F6','\u201C','\u201D','\u20
1E','\u201F',
 
'\u2033','\u2036','\u301D','\u301E','\u301F','\u8220','\u8221','\u2025',
'\u2026',
 
'\u002D','\u2010','\u2011','\u2012','\u2013','\u2014','\u2015','\u2212',
'\u00AD'};
    return translatedChar;
}
%>




Reply via email to