Bugs item #2798331, was opened at 2009-05-29 12:16
Message generated for change (Comment added) made by sjoerd
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2798331&group_id=56967

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: SQL/ODBC
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Pawel Lewicki (plewy)
Assigned to: Sjoerd Mullender (sjoerd)
Summary: ODBC - broken 'utf-8' data

Initial Comment:
The same problem as reported in #2014118
SQL queries executed using ODBC driver are not properly transferred. Inserted 
data has broken utf-8 characters.
Broken data is recognized by "LIKE" queries if using the same driver (ODBC vs 
native Python bindings).
Windows 32-bit Feb2009-SP2 release.
ODBC driver version is 1.28.04.01

 >>> import MonetSQLdb
 >>> testStr = 'us\xc5\x82ugi'
 >>> conPython= MonetSQLdb.connect(host = 'localhost', user = 'monetdb', 
 >>> password = 'monetdb', lang = 'sql', dbname='demo')
 >>> cuPython = con.cursor()
 >>> cuPython.execute("CREATE TABLE testtable (testfield varchar(30))")
0
 >>> cuPython.execute("INSERT INTO testtable VALUES ('%s')"%testStr)
1
 >>> cuPython.execute("SELECT * FROM testtable WHERE testfield LIKE 
 >>> '%s';"%testStr)
1
 >>> print cuPython.fetchall()
[('us\xc5\x82ugi',)]
 >>>
 >>>
 >>> import pyodbc
 >>> conODBC = pyodbc.connect('DSN=demo')
 >>> cuODBC = conODBC.cursor()
 >>> cuODBC.execute("SELECT * FROM testtable WHERE testfield LIKE 
 >>> '%s';"%testStr)
<pyodbc.Cursor object at 0x00D322F8>
 >>> print cuODBC.fetchall()
[]
 >>> cuODBC.execute("INSERT INTO testtable VALUES ('%s')"%testStr)
1
>>> cuODBC.execute("COMMIT;") #only Python driver is in autocommit mode
0
>>> cuODBC.execute("SELECT * FROM testtable WHERE testfield LIKE '%s';"%testStr)
<pyodbc.Cursor object at 0x00D31100>
>>> print cuODBC.fetchall()
[('us\xc4\xb9\xe2\x80\x9augi', )]

>>> cuPython.execute("SELECT * FROM testtable")
2
>>> print cuPython.fetchall()
[('us\xc5\x82ugi',), ('us\xc4\xb9\xe2\x80\x9augi',)]
>>> cuODBC.execute("SELECT * FROM testtable")
<pyodbc.Cursor object at 0x00D31100>
>>> print cuODBC.fetchall()
[('us\xc5\x82ugi', ), ('us\xc4\xb9\xe2\x80\x9augi', )]
 
 >>> cuPython.execute("DROP TABLE testtable")
0

----------------------------------------------------------------------

>Comment By: Sjoerd Mullender (sjoerd)
Date: 2009-06-05 15:56

Message:
The log shows that the ODBC driver is called using the SQLExecDirectW
interface which is the wide character interface for SQLExecDirect.  The
parameter to SQLExecDirectW is a UTF-16 string which is translated by the
ODBC driver to UTF-8 (there is a well-defined translation--there is no
problem here).

I tried the query again, and also a variant:
>>> cuODBC.execute("SELECT * FROM testtable WHERE testfield LIKE '%s'" %
testStr)
<pyodbc.Cursor object at 0x00C05C98>
>>> cuODBC.fetchall()
[]
>>> cuODBC.execute(u"SELECT * FROM testtable WHERE testfield LIKE '%s'" %
testStr.decode('utf-8'))
<pyodbc.Cursor object at 0x00C05C98>
>>> cuODBC.fetchall()
[('us\xc5\x82ugi', )]

The first uses a UTF-8 encoded string which is translated to wide
characters by pyodbc or the ODBC driver manager and then passed to the ODBC
driver.  The second does the conversion to a Python unicode string
explicitly which is then passed to the ODBC driver as wide characters.  The
first query fails, the second works.  The difference is in the conversion
to wide characters.  The first is done by pyodbc or the ODBC driver manager
(I don't know which) and is incorrect, the second is done by explicit code
and is correct.

(When I say "incorrect" I mean, the string that is converted to UTF-16 is
not interpreted as UTF-8 (which was intended) but as some other encoding.)

From the evidence I've seen so far, I cannot put any blame on the ODBC
driver.  The problem, as I see it, is in the conversion to UTF-16 by pyodbc
(or the ODBC driver manager).


----------------------------------------------------------------------

Comment By: Pawel Lewicki (plewy)
Date: 2009-06-05 14:07

Message:
I updated client and server to May 2009 - no change.
odbc.log:

DllMain 1
DllMain 2
DllMain 2
DllMain 2
SQLAllocHandle Env 0
DllMain 2
new env 1172f28
SQLSetEnvAttr 1172f28 200 3
SQLAllocHandle Dbc 1172f28
new dbc 1174860
SQLGetInfoW 1174860 77
addDbcError 1174860 01004 String data, right-truncated 0
SQLSetConnectAttrW 1174860 115
addDbcError 1174860 HY092 Invalid attribute/option identifier 0
SQLDriverConnectW 1174860 "DSN=monet_demo;" 0
addDbcError 1174860 01004 String data, right-truncated 0
"DSN=monet_demo;" 0
SQLConnect: DSN=monet_demo UID=monetdb PWD=monetdb host=localhost
port=50000 database=demo
addDbcError 1174860 01004 String data, right-truncated 0
SQLGetDiagRecW Dbc 1174860 1 0
SQLGetDiagRecW: [MonetDB][ODBC Driver 1.0]String data, right-truncated
SQLGetDiagRecW Dbc 1174860 1 56
SQLGetDiagRecW: [MonetDB][ODBC Driver 1.0]String data, right-truncated
SQLGetDiagFieldW Dbc 1174860 1 8 256
SQLGetDiagFieldW: ISO 9075
SQLGetDiagFieldW Dbc 1174860 1 9 256
SQLGetDiagFieldW: ISO 9075
SQLGetDiagFieldW Dbc 1174860 1 10 256
SQLGetDiagFieldW: MonetDB ODBC/Mapi
SQLGetDiagFieldW Dbc 1174860 1 11 256
SQLGetDiagFieldW: monet_demo
SQLGetDiagRecW Dbc 1174860 2 0
SQLGetFunctions 1174860 999
SQLGetInfoW 1174860 23
SQLGetInfoW 1174860 24
SQLGetInfoW 1174860 1750
addDbcError 1174860 HY096 Invalid information type 0
SQLSetConnectAttrW 1174860 102
SQLSetConnectAttr set autocommit off
SQLGetInfoW 1174860 10002
addDbcError 1174860 01004 String data, right-truncated 0
SQLAllocHandle Stmt 1174860
new stmt 1d440a0
SQLGetStmtAttrW 1d440a0 10010
SQLGetStmtAttrW 1d440a0 10011
SQLGetStmtAttrW 1d440a0 10012
SQLGetStmtAttrW 1d440a0 10013
SQLGetTypeInfoW 1d440a0 93
ODBCInitResult: querytype 1
SQLFetch 1d440a0
SQLGetData 1d440a0 3 4
SQLFreeStmt 1d440a0 0
SQLAllocHandle Stmt 1174860
new stmt 1174ba8
SQLGetStmtAttrW 1174ba8 10010
SQLGetStmtAttrW 1174ba8 10011
SQLGetStmtAttrW 1174ba8 10012
SQLGetStmtAttrW 1174ba8 10013
SQLFreeStmt 1174ba8 0
SQLExecDirectW 1174ba8
SQLExecDirect: "SELECT * FROM testtable WHERE testfield LIKE
'usługi';"
ODBCInitResult: querytype 1
SQLNumResultCols 1174ba8
SQLRowCount 1174ba8
SQLDescribeColW 1174ba8 1
addStmtError 1174ba8 01004 String data, right-truncated 0
SQLDescribeColW 1174ba8 1
addStmtError 1174ba8 01004 String data, right-truncated 0
SQLDescribeColW 1174ba8 1
addStmtError 1174ba8 01004 String data, right-truncated 0
SQLFetch 1174ba8
SQLFreeStmt 1174ba8 0
SQLExecDirectW 1174ba8
SQLExecDirect: "INSERT INTO testtable VALUES ('usługi')"
ODBCInitResult: querytype 2
SQLNumResultCols 1174ba8
SQLRowCount 1174ba8
SQLFreeStmt 1174ba8 0
SQLExecDirectW 1174ba8
SQLExecDirect: "COMMIT;"
ODBCInitResult: querytype 4
SQLNumResultCols 1174ba8
SQLRowCount 1174ba8
SQLFreeStmt 1174ba8 0
SQLExecDirectW 1174ba8
SQLExecDirect: "SELECT * FROM testtable WHERE testfield LIKE
'usługi';"
ODBCInitResult: querytype 1
SQLNumResultCols 1174ba8
SQLRowCount 1174ba8
SQLDescribeColW 1174ba8 1
addStmtError 1174ba8 01004 String data, right-truncated 0
SQLDescribeColW 1174ba8 1
addStmtError 1174ba8 01004 String data, right-truncated 0
SQLDescribeColW 1174ba8 1
addStmtError 1174ba8 01004 String data, right-truncated 0
SQLFetch 1174ba8
SQLGetData 1174ba8 1 1
SQLFetch 1174ba8
SQLFreeStmt 1174ba8 0
SQLExecDirectW 1174ba8
SQLExecDirect: "SELECT * FROM testtable"
ODBCInitResult: querytype 1
SQLNumResultCols 1174ba8
SQLRowCount 1174ba8
SQLDescribeColW 1174ba8 1
addStmtError 1174ba8 01004 String data, right-truncated 0
SQLDescribeColW 1174ba8 1
addStmtError 1174ba8 01004 String data, right-truncated 0
SQLDescribeColW 1174ba8 1
addStmtError 1174ba8 01004 String data, right-truncated 0
SQLFetch 1174ba8
SQLGetData 1174ba8 1 1
SQLFetch 1174ba8
SQLGetData 1174ba8 1 1
SQLFetch 1174ba8
SQLFreeStmt 1174ba8 0
SQLFreeHandle Stmt 1174ba8
SQLEndTran Dbc 1174860 1
new stmt 1174ba8
SQLExecDirect: "rollback"
ODBCInitResult: querytype 4
SQLFreeStmt 1d440a0 1
SQLGetInfoW 1174860 1
SQLDisconnect 1174860
SQLFreeHandle Dbc 1174860
SQLFreeHandle Env 1172f28
DllMain 0


----------------------------------------------------------------------

Comment By: Sjoerd Mullender (sjoerd)
Date: 2009-06-03 18:01

Message:
I am not convinced that the problem is in the ODBC driver.  In other words,
I need more evidence.

Can you repeat the pyodbc interaction after setting the environment
variable ODBCDEBUG to the name of a file where the ODBC driver can dump
logging infromation (e.g.
set ODBCDEBUG=C:\odbc.log
).  Can you then attach the resulting file to this bug report?

What I did to try out the bug is to create a table like you did, except I
inserted just one value consisting of an a-ring (U+00C5, or '\xc4\xa5' in
UTF-8).  When I try to select with a like clause with the a-ring from a
command window, SQLExecDirect is called with a string containing
\xc3\x83\xc2\xa5 when I write out the UTF-8 sequence (similar to what you
did), and \xe2\x80\xa0 if I use a literal a-ring.  This shows that pyodbc
(or the Windows ODBC driver manager) translates the input string from
(presumably) cp1252 to UTF-8, even though the command window uses cp437.

My exact queries where:
cuODBC.execute('select * from testtable where testfield like
\'\xc3\xa5\'')
cuODBC.execute('select * from testtable where testfield like \'å\'')
and the log shows these queries (and there is absolutely no conversion in
the ODBC driver before this gets logged, i.e. these are the exact strings
the driver gets from  pyodbc):
SQLExecDirect: "select * from testtable where testfield like
'<C3><83><C2><A5>'"
SQLExecDirect: "select * from testtable where testfield like
'<E2><80><A0>'"
Note that in cp437 (which the console uses, a-ring has the value '\x86',
and that this value is interpreted as the dagger (U+2020) in cp1252.  The
UTF-8 encoding of U+2020 is the 3 byte string shown above.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2798331&group_id=56967

------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
Monetdb-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-bugs

Reply via email to