From: [EMAIL PROTECTED]
To:   [EMAIL PROTECTED]
Subject: Inserting blob in database with gbk charset
may cause error or data corruption

Description:
        When mysql's sql statement lexer (sql/sql_lex.cpp))
analyzes a sql string, it uses the my_ismbchar macro
on the default charset structure to identify and skip
multibyte characters. The gbk charset's implementation
of ismbchar (in strings/ctype-gbk.c) checks if the
first byte is between 0x81 and 0xfe, and the second
byte is between 0x40 to 0x7e or 0x80 to 0xfe, if they
are then the two bytes are recognized as a gbk
character and are skipped in the lexer. The problem is
the sql esacpe character '\' (0x5C) falls in the
second range. When escaping a piece of binary data, it
is possible the '\' inserted will form a valid gbk
character with a previous byte and get skipped in
lexer, which leads to incorrect behavior.

How-To-Repeat:
1. Error:
-- Add "default-character-set = gbk" in C:/my.cnf,
restart mysql database.
-- Create a new table with a blob column:
DROP DATABASE IF EXISTS test;
CREATE DATABASE test;
USE test;
DROP TABLE IF EXISTS t_blob_test;
CREATE TABLE t_blob_test (
  n_id int(11) NOT NULL default '0',
  b_blob blob,
  PRIMARY KEY  (n_id)
);
-- Attempt to insert 0xEE 27 45 to the blob column (I
used MySQL Connector/J, but anything that properly
escapes the sql statement should do, handcrafted sql
is "USE test;INSERT INTO t_blob_test values (1,
'...');", where ... is the escaped binary data), and
an error "You have an error in your SQL syntax near
... at line 1" is generated

What happened is:
-- According to section 6.1.1 of the mysql manual,
0x27 must be escaped by '\'', i.e. 0x5C 27, the
resulting sequence is 0xEE 5C 27 45
-- Once the sequence reaches the server, the sql lexer
uses gbk's ismbchar to determine that 0xEE 5C is a gbk
character and skips them. The next byte the lexer
encounters is the unescaped single quote 0x27, it then
decides this is the end of the column data. The byte
after 0x27 is 0x45, i.e. ASCII char 'E', which the
lexer regards as an syntax error.

2. Data Corruption
-- Same setup as 1
-- Attempt to insert 0xEE 22 to the blob column, the
data will be inserted as 0xEE 5C 22 

What happened is:
-- Notice 0xEE 22 is not a valid gbk character, so
escape processing will be used, it is escaped to 0xEE
5C 22
-- At the server side, 0xEE 5C is recognized as a gbk
character and gets skipped, 0x22 is inserted normally,
which produces 0xEE 5C 22 in the column

Fix:
        Avoid using binary data insert/update in gbk database
is the only one I can think of. A way to disable
multibyte skipping behavior when processing binary
data may be a solution.

Synopsis:       Inserting blob in database with gbk charset
may cause error or data corruption      
Originator:     RJX  
MySQL support:  none
Severity:       serious
Priority:       medium
Category:       mysqld
Class:  sw-bug
Release:        mysql-3.23.56, mysql-4.0.12

Exectutable:   mysqld-nt (run as a service)
Environment1:  Custom built Duron 800, 128MB Memory,
40GB HD
Environment2:  Toshiba Tecra 8100, PIII 550, 256MB
Memory, 40GB HD
System:        Win2000 Professional SP3 (both Chinese
& Engish version)

__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to