From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Inserting blob in database with gbk charset may cause error or data corruption
Description: When mysql's sql statement lexer (sql/sql_lex.cpp)) analyzes a sql string, it uses the my_ismbchar macro on the default charset structure to identify and skip multibyte characters. The gbk charset's implementation of ismbchar (in strings/ctype-gbk.c) checks if the first byte is between 0x81 and 0xfe, and the second byte is between 0x40 to 0x7e or 0x80 to 0xfe, if they are then the two bytes are recognized as a gbk character and are skipped in the lexer. The problem is the sql esacpe character '\' (0x5C) falls in the second range. When escaping a piece of binary data, it is possible the '\' inserted will form a valid gbk character with a previous byte and get skipped in lexer, which leads to incorrect behavior. How-To-Repeat: 1. Error: -- Add "default-character-set = gbk" in C:/my.cnf, restart mysql database. -- Create a new table with a blob column: DROP DATABASE IF EXISTS test; CREATE DATABASE test; USE test; DROP TABLE IF EXISTS t_blob_test; CREATE TABLE t_blob_test ( n_id int(11) NOT NULL default '0', b_blob blob, PRIMARY KEY (n_id) ); -- Attempt to insert 0xEE 27 45 to the blob column (I used MySQL Connector/J, but anything that properly escapes the sql statement should do, handcrafted sql is "USE test;INSERT INTO t_blob_test values (1, '...');", where ... is the escaped binary data), and an error "You have an error in your SQL syntax near ... at line 1" is generated What happened is: -- According to section 6.1.1 of the mysql manual, 0x27 must be escaped by '\'', i.e. 0x5C 27, the resulting sequence is 0xEE 5C 27 45 -- Once the sequence reaches the server, the sql lexer uses gbk's ismbchar to determine that 0xEE 5C is a gbk character and skips them. The next byte the lexer encounters is the unescaped single quote 0x27, it then decides this is the end of the column data. The byte after 0x27 is 0x45, i.e. ASCII char 'E', which the lexer regards as an syntax error. 2. Data Corruption -- Same setup as 1 -- Attempt to insert 0xEE 22 to the blob column, the data will be inserted as 0xEE 5C 22 What happened is: -- Notice 0xEE 22 is not a valid gbk character, so escape processing will be used, it is escaped to 0xEE 5C 22 -- At the server side, 0xEE 5C is recognized as a gbk character and gets skipped, 0x22 is inserted normally, which produces 0xEE 5C 22 in the column Fix: Avoid using binary data insert/update in gbk database is the only one I can think of. A way to disable multibyte skipping behavior when processing binary data may be a solution. Synopsis: Inserting blob in database with gbk charset may cause error or data corruption Originator: RJX MySQL support: none Severity: serious Priority: medium Category: mysqld Class: sw-bug Release: mysql-3.23.56, mysql-4.0.12 Exectutable: mysqld-nt (run as a service) Environment1: Custom built Duron 800, 128MB Memory, 40GB HD Environment2: Toshiba Tecra 8100, PIII 550, 256MB Memory, 40GB HD System: Win2000 Professional SP3 (both Chinese & Engish version) __________________________________________________ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://tax.yahoo.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]