> Lastly, I am convinced that this is a problem with MySQL. I have some time again, let me try again. It's a tortuous tale, but bear with me.
Background in brief: Shift-JIS is a real pain in the neck to parse. When you look at a byte, it's often impossible to tell whether you're looking at the first byte or the second byte of the character. Shift-JIS has this almost-neat feature where they shoe-horned US-ASCII, with the two exceptions I've mentioned before, into the lowest 128 codes. _If_ _you_ _parse_ _from_ _the_ _beginning_, you can deterministically pick off each one-byte character and each two byte character. Unfortunately, if line-noise causes you to miss a byte, or if you start anywhere but the beginning of a string, you can run into a lot of situations where you can't tell whether you're in the middle of a character or not. The byte value of 0x5c, which is that wonderful backslash which MySQL (and C, PHP, Java, et. al.) uses as an escape character, is one case in point. It can be either a single byte of backslash, or it can be the end byte of one of several valid Japanese characters. *** moderately important stuff *** I believe, if you set up MySQL to operate in shift-JIS, it parses the characters correctly, in addition to handling sort and collation "correctly". But that's kind of beside the point here. If my memory here is correct, you can set your whole database (well, your instance of MySQL) to assume the shift-JIS character set, and MySQL will properly ignore any 0x5c that is not a backslash. *** end moderately important stuff *** (unimportant stuff) (The fact that the backslash shows up as the yen symbol in your browser when looking at a shift-JIS page is a red herring. I shouldn't have brought it up. Ignore it. Another piece of information you didn't need, and should ignore, is that shift-JIS double-wide backslash is 0x815f. It is not used as an escape character, so just forget I told you its value. Don't bother looking for it in your text. It has nothing to do with what's going on. Incidentally, its end byte is the same value as US-ASCII underscore, which you also don't need to know.) (end unimportant stuff) But, you are trying to use multiple languages in a single database. It so happens that, if you play by the rules (which are a little difficult to keep straight the first time) you can get away with this. Why? Because you are not sorting or collating, just getting the data out by keys that have nothing to do with shift-JIS. In either the current MySQL 4.0 or the next version, I don't remember which, I am told you will be able to set the language for each table. This will be marvelous. A lot of peace of mind, and great help, in fact. But not necessary for your current project, since you aren't collating or sorting on Japanese. ****** extremely important stuff ****** The issue you face is how to avoid the 0x5c being treated as an escape character. But it doesn't matter whether the text is Japanese or not. ****** end extremely important stuff ****** Let me explain that. You are taking text from an ordinary source and pasting it into MySQL (through the command line, shall we say). You should never have any escapes in that text. If you see a \, it is merely a backslash. \t, if it were in your text, would not mean the tab character. \n would not mean a newline. A quote would not be the beginning of a string, and a \" would be just that, a backslash followed by a double quote. See http://www.mysql.com/doc/en/String_syntax.html Note carefully that this page tells you exactly what characters will require escaping. (Sorry I can't provide the code. It shouldn't be that hard to write, or maybe someone else on the list who uses .asp will volunteer. As a hints, I'll give you a bit of untested C below.) If MySQL allows you to turn off character escaping, you would want to simply turn it off. As I understand, it doesn't. So you need to catch NULs, backslashes, and single and double quotes, and stick a(nother) backslash in front of them before you pass them to the database. (This should feel a lot like what you do when picking up text from a form on the web to go into an html page. But it is done at a different time, and the characters to escape are different.) ****** crux-of-the-matter ****** This is crude, it feels like a real kludge, but it works. Since your database is not set up to parse shift-JIS, it is absolutely going to think that every 0x5c it sees is a backslash, even when preceded by 0x83 or some other lead byte of a valid shift-JIS character. It doesn't know for shift-JIS. Because you haven't set it up to care, it doesn't care at all, and just basically ignores the whole multi-byte character business. So, you just treat the 0x5c end-bytes like backslashes. Escape them like they were real backslashes, and the escaping backslash gets eaten on its way into the database. ****** end crux-of-the-matter ****** 0x835c, for instance: Your escaping routine doesn't need to pay any attention to the 0x83 (or whatever the lead byte might have been). All it needs to do is see the 0x5c and paste a 0x5c in front of it. MySQL sees the filtered 0x835c5c and strips the first escape character. What goes into the database is 0x835c. I make no claims about the following code, it is merely illustrative: void escapeForMySQL( char source[], char buffer[], int count ) { int ch, i; j = 0; for ( i = 0; i < count; i = i + 1 ) { ch = source[ i ]; if ( ( ch == '\\' ) /* C also uses backslash */ or ( ch == '\'' ) /* (C++ syntax) */ or ( ch == '"' ) or ( ch == '\0' ) ) { buffer[ j ] = '\\'; j = j + 1; } buffer[ j ] = source[ i ]; j = j + 1; } } It's not my usual style, but I think it should be easy to translate into VBScript and test. If I messed that up, someone correct me? If you know regular expressions, the RE should be straightforward. MySQL provides the escaping function as part of the C API for MySQL, by the way. You might want to look at the source for that if you are not comfortable with the above, or if you just like to read source. Anyway, you should do something like the above in between getting the text from your content tool's form and putting it into MySQL. I'd give you a link to some of my rants about characters, but you really shouldn't need to be distracted by that right now. Now that I've run you through the wringer with MySQL's escaping, I have some bad news for you. The above will fix the problem with getting your data into MySQL. I checked your screenshot and it definitely will fix the problem with getting your data into MySQL. But there is something else going on between getting the data out of MySQL and getting it into your pages. 0x836c somehow morphed to 0x8e4c. Or was that the other direction? That won't be a topic for the MySQL list, however. -- Joel Rees <[EMAIL PROTECTED]> --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php