Bugs item #2970087, was opened at 2010-03-13 21:40 Message generated for change (Comment added) made by faridz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2970087&group_id=56967
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: SQL/ODBC Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Farid Z (faridz) Assigned to: Sjoerd Mullender (sjoerd) Summary: ANSI Client clobbered data ::SQLTables Initial Comment: I am using an ANSI ODBC client (not Unicode) with latest version of MonetDB ODBC driver 1.36.01.01 2/24/2010 and I am getting clobbed resultset comparing demo database loaded with VOC dataset to itself by using Zidsoft CompareData app. Does the ODBC driver not support ANSI clients? ---------------------------------------------------------------------- >Comment By: Farid Z (faridz) Date: 2010-04-16 12:52 Message: >What does "the actual character length for a variable-length data type" mean It means the length of the column as defined by the DDL. Example, varchar(30) -> 30 I think we need to differentiate between describing a resultset columns (resultset description as returned by SQLColAttribute and other calls) and actual data values that are only known when the resultset is actually retrieved (stored in location pointed to by StrLen_or_Ind used in SQLBindCol). When I am getting resultset description from the driver no data is retrieved (I don't want to retrieve any data yet, the table could have 4 million rows or no rows, but client only wants to know the resultset column descriptions). This is usually done by preparing the select statement "select * from <table>" Remember I don't want the driver to retrieve the resultset I only want to know what is resultset description is. So the database can have 1000 tables and all the tables are empty, but the client should still be able to determine the resultset description for each table by preparing "select * from <table>" for example (or actually executing the select). So for example, create table test (col1 varchar(30)) I should be able to determine that col1 resultset column length is 30 and octet length is 30 (ANSI) even though the table is empty by calling SQLColAttribute and SQL_DESC_LENGTH, SQL_DESC_OCTET_LENGTH Whether the table has data or not is not-relevant to the resultset description. >I suppose I could change the code so that the narrow interface is CP-1252 and if you want to use >characters that don't occur in that code page you're forced to use the wide interface No. The client can be using Windows codepage 1252 or some other codepage (1256 Arabic, for example), depending on the client locale, etc. You can determine the code page the client is using (ANSI) by calling Windows API GetACP(VOID) Since the server uses UTF8 then when transferring data to an ANSI client (narrow calls) the driver needs to convert from UTF8 to the codepage returned by GetACP() call. The conversion of course could fail if the client codepage has no equivalent to the UTF8 character and in this case the driver would report a conversion error for only that piece of data. Similarly for wide clients, the driver would convert from UTF8 to UTF16 (UCS16, actually) and report any conversion errors (less likely). ---------------------------------------------------------------------- Comment By: Sjoerd Mullender (sjoerd) Date: 2010-04-16 12:17 Message: What does "the actual character length for a variable-length data type" mean? I thought it meant the length of the (longest) actual value in the column. Since this is from SQLColAttribute, there are actual values in the result set to talk about. From your comment I understand you have a different interpretation (namely the maximum allowed length for the column). Is there anywhere a clearer definition of these attributes? The definition of SQL_DESC_OCTET_LENGTH is perhaps a little clearer, although I could interpret that either way as well: "For variable-length character or binary types, this is the maximum length in bytes". Which maximum? The maximum according to the column definition or the maximum in the current result set? In addition to SQL_DESC_LENGTH and SQL_DESC_OCTET_LENGTH, there is also SQL_DESC_DISPLAY_SIZE which has a similar issue. Does that value depend on the actual content of the column (as returned by the query) or does that depend on the definition of the column? I can see that the width of the columns returned by SQLTables could depend on the limitations of the server for the type of information (schema name length, etc.) Unfortunately, that doesn't help currently since, as mentioned, the information is not passed on. As to ANSI vs. Unicode. We only support UTF-8 in the narrow calls and UTF-16 in the wide calls. As I understand it, the term ANSI refers to a Windows code page which is close to ISO-8859-1, but in any case *not* compatible with UTF-8. I suppose I could change the code so that the narrow interface is CP-1252 and if you want to use characters that don't occur in that code page you're forced to use the wide interface. I could also differentiate between the calls with or without A at the end (e.g. SQLTables vs. SQLTablesA) where the calls with A use the "ANSI" code page and the calls without use UTF-8. But I don't really like this idea. (My weekend is starting now, so I probably won't respond before Monday.) ---------------------------------------------------------------------- Comment By: Farid Z (faridz) Date: 2010-04-16 10:16 Message: >The columns SQLTables returns are all varchar, without mention how large >they should be The returned info would represent how the column is stored at the dbms. That's why the ODBC spec does not specify exact character counts. For DB2 this a table name can be maximum of 128 characters so their drivers returns 128 for SQL_DESC_LENGTH and 128 for SQL_DESC_OCTET_LENGTH for ANS or 128*2 for UTF16. You can ignore the catalog column of the resultset since MonetDB does not support catalogs and if the driver correctly reports that catalog are not supported by the DBMS. >Another problem is that SQL_DESC_LENGTH counts *characters* and >SQL_DESC_OCTET_LENGTH counts *bytes*. These are not interchangeable. >MonetDB uses the UTF-8 encoding Example column declared as varchar(30) the driver would return SQL_DESC_LENGTH -> 30 SQL_DESC_OCTET_LENGTH -> 30 for ANSI version of the driver, 30*2 for Unicode version of the driver. For Windows the driver must translate from dbms characterset (UTF8) to client characterset: ANSI (single byte characterset) or Windows UTF16 (implemented by ODBC spec as wchar_t two bytes). >your suggestion of using 128 characters (or is that bytes?) doesn't help That was an example, not a suggestion. For MontDB the driver needs to return the actual info for the underlying catalog "views" where the information for SQLTables is stored. MonetDB must be storing the table names somewhere in its catalogs. >There *is*, however, a problem in MonetDB. The server returns a single column length value, and for varchar >columns that value is the actual length of the longest value in the column (0 if all values are NULL or if >there are no rows) That would be a problem. If I create a column as varchar(30) then I should get (as explained above) whether or not there is ANY data in the table: SQL_DESC_LENGTH -> 30 SQL_DESC_OCTET_LENGTH -> 30 for ANSI version of the driver, 30*2 for Unicode version of the driver SQL_DESC_LENGTH and SQL_DESC_OCTET_LENGTH are used by the ODBC client to create and allocate buffers for the resultset data and without correct information for these attributes, the ODBC client can not correctly retrieve data from the database ---------------------------------------------------------------------- Comment By: Sjoerd Mullender (sjoerd) Date: 2010-04-16 09:46 Message: I read that too, but I'm not sure I can come to the same conclusions. I'm struggling to understand what the specification means. The columns SQLTables returns are all varchar, without mention how large they should be. The first column is the catalog name. The values here can be NULL if the data source does not support catalogs. Since MonetDB indeed doesn't support catalogs, the value returned is indeed NULL. The description in SQLTables doesn't say what size varchar column to return, but presumably one that is large enough to hold the data. In this case, varchar(1) is plenty long enough (for this particular column). For SQLColAttribute, the description of SQL_DESC_LENGTH says that, for variable-length data (which varchar is), the size returned is the "actual character length" (we can ignore the fixed-length case). What is the actual character length of a NULL value? I can argue 0. The problem for SQL_DESC_OCTET_LENGTH is that, for variable-length character types, it returns the *maximum* length in bytes. Again, what is the maximum length of a column that is always going to be NULL? The query that SQLTables does under the hood uses cast(null as varchar(1)) for the catalog column. There *is*, however, a problem in MonetDB. The server returns a single column length value, and for varchar columns that value is the actual length of the longest value in the column (0 if all values are NULL or if there are no rows). (The value returned is the declared length for fixed-length columns, i.e. 50 for char(50).) In other words, the client does not know the maximum allowed length of the column if the longest value is shorter. So if a column is declared varchar(50) but only contains NULLs, the value returned by the server is 0. This means that determining SQL_DESC_LENGTH is relatively easy, but determining SQL_DESC_OCTET_LENGTH is not currently possible for varchar columns. Another problem is that SQL_DESC_LENGTH counts *characters* and SQL_DESC_OCTET_LENGTH counts *bytes*. These are not interchangeable. MonetDB uses the UTF-8 encoding, which is a variable-length encoding. A character can take from 1 to 6 bytes. Should we then multiply the value by 6? If a column is declared as e.g. varchar(10), that means that it can store up to 10 characters in MonetDB. The problem we're having here doesn't really have anything to do with SQLTables as such. The implementation does a SQLExecDirect call under the hood, and SQLColAttribute doesn't know (or care) how the result set was created. The problem has to do with getting metainformation about the result set. In other words, your suggestion of using 128 characters (or is that bytes?) doesn't help. That information is not passed on to SQLColAttribute, and anyway, the same problems would exist for hand-created SQL queries. I do agree that all metainformation-returning queries should return compatible information. ---------------------------------------------------------------------- Comment By: Farid Z (faridz) Date: 2010-04-16 08:43 Message: The driver needs to return valid values (not 0 and certainly usually greater than 1) for SQL_DESC_LENGTH and SQL_DESC_OCTET_LENGTH for the resultset columns as per ODBC descriptions for these. http://msdn.microsoft.com/en-us/library/ms713558(VS.85).aspx Quote: SQL_DESC_LENGTH: A numeric value that is either the maximum or actual character length of a character string or binary data type. It is the maximum character length for a fixed-length data type, or the actual character length for a variable-length data type. Its value always excludes the null-termination byte that ends the character string SQL_DESC_OCTET_LENGTH : The length, in bytes, of a character string or binary data type. For fixed-length character or binary types, this is the actual length in bytes. For variable-length character or binary types, this is the maximum length in bytes. This value does not include the null terminator. So for SQLTables resultset table names these would be for example, 128, 128 assuming that the maximum length for a table name in MonetDB is 128 character. Similar info should be returned by the driver for SQLStatistics, SQLPrimaryKeys and SQLForeignKesy, SQLGetTypeInfo and all other ODBC catalog functions for these attributes. ---------------------------------------------------------------------- Comment By: Sjoerd Mullender (sjoerd) Date: 2010-04-16 04:07 Message: I have been able to also get the value 0 from the call to SQLColAttribute on column 1 after a call to SQLTables. The reason for the value 0 is that (in my case, anyway) all values in the column are NULL. If I look at column 2 where I do get a non-NULL value, the value for SQL_DESC_OCTET_LENGTH and SQL_DESC_LENGTH is indeed greater than 0. The ODBC speciification just says that the types of the columns are "varchar" without any size indication. So if the columns are completely filled with NULLs, the server just returns a length indication of 0. I suppose I could arbitrarily change the length to 1 when it would otherwise have been 0. ---------------------------------------------------------------------- Comment By: Farid Z (faridz) Date: 2010-04-15 16:42 Message: I am running the 32-bit version of my app on Windows XP SP3 32-bit and using Windows characters set 1252 (US-EN). I am using ODBC types for column_size and buffer_length which in 32-bit version of my app are SQUINTEGER and SQLINTEGER. SQLULEN column_size; SQLLEN buffer_length; My app is tested successfully with 30-40 different DBMS ODBC drivers without encountering this issue http://www.zidsoft.com/dbmsnotes.html The driver is returning zero for SQL_DESC_OCTET_LENGTH and SQL_DESC_LENGTH for ::SQLTables character resultsetset columns instead of actual values. Same thing for ::SQLPrimaryKeys, ::SQLStatistics and ::SQLForeignKeys resultsets character columns ---------------------------------------------------------------------- Comment By: Sjoerd Mullender (sjoerd) Date: 2010-04-15 15:55 Message: If by ANSI you mean one of the ISO-8859 versions (e.g. ISO-8859-1 aka "latin 1") then that is definitely a problem. Those are *not* compatible with UTF-8. What are the types of pRow->buffer_length and pRow->column_size? Is your application 32 bit or 64 bit? The MonetDB ODBC driver may not be completely compliant with the 64-bit definition of the ODBC interface. I'm hoping that can be fixed in the next feature release (not in a bug fix release since it's an API change). ---------------------------------------------------------------------- Comment By: Farid Z (faridz) Date: 2010-03-15 13:28 Message: My app is ANSI so I am using narrow calls. So that's not the cause of this issue. I see the cause of the problem: the driver is returning incorrect information for ::SQLTables character resultset columns: rc = SQLColAttribute(hstmt, nCol, SQL_DESC_OCTET_LENGTH, NULL, 0, NULL, &pRow->buffer_length); rc = SQLColAttribute(hstmt, nCol, SQL_DESC_LENGTH, NULL, 0, NULL, (SQLLEN*) &pRow->column_size); The driver is returning zero for SQL_DESC_OCTET_LENGTH and SQL_DESC_LENGTH for ::SQLTables character resultsetset columns instead of actual values. Same thing for ::SQLPrimaryKeys, ::SQLStatistics and ::SQLForeignKeys resultsets character columns ---------------------------------------------------------------------- Comment By: Sjoerd Mullender (sjoerd) Date: 2010-03-15 13:03 Message: The MonetDB ODBC driver only support Unicode and UTF-8. That is to say, the wide character calls are expected to use UCS-16, and the narrow character calls are expected to use UTF-8. The driver does not support any other character sets (well, ASCII since it is a proper subset of UTF-8). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2970087&group_id=56967 ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Monetdb-bugs mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/monetdb-bugs
