Bugs item #2970087, was opened at 2010-03-13 21:40
Message generated for change (Comment added) made by faridz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2970087&group_id=56967

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: SQL/ODBC
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Farid Z (faridz)
Assigned to: Sjoerd Mullender (sjoerd)
Summary: ANSI Client clobbered data ::SQLTables

Initial Comment:
I am using an ANSI ODBC client (not Unicode) with latest version of MonetDB 
ODBC driver 1.36.01.01 2/24/2010 and I am getting clobbed resultset comparing 
demo database loaded with VOC dataset to itself by using Zidsoft CompareData 
app.

Does the ODBC driver not support ANSI clients?

----------------------------------------------------------------------

>Comment By: Farid Z (faridz)
Date: 2010-04-16 12:52

Message:
>What does "the actual character length for a variable-length data type"
mean
It means the length of the column as defined by the DDL. Example,
varchar(30) -> 30
I think we need to differentiate between describing a resultset columns
(resultset description as returned by SQLColAttribute and other calls) and
actual data values that are only known when the resultset is actually
retrieved (stored in location pointed to by StrLen_or_Ind used in
SQLBindCol).

When I am getting resultset description from the driver no data is
retrieved (I don't want to retrieve any data yet, the table could have 4
million rows or no rows, but client only wants to know the resultset column
descriptions).  This is usually done by preparing the select statement
"select * from <table>"

Remember I don't want the driver to retrieve the resultset I only want to
know what is resultset description is. So the database can have 1000 tables
and all the tables are empty, but the client should still be able to
determine the resultset description for each table by preparing  "select *
from <table>"  for example (or actually executing the select). So for
example,
create table test (col1 varchar(30))

I should be able to determine that col1 resultset column length is 30 and
octet length is 30 (ANSI) even though the table is empty by calling
SQLColAttribute and SQL_DESC_LENGTH, SQL_DESC_OCTET_LENGTH

Whether the table has data or not is not-relevant to the resultset
description.

>I suppose I could change the code so that the narrow interface is CP-1252
and if you want to use >characters that don't occur in that code page
you're forced to use the wide interface
No. The client can be using Windows codepage 1252 or some other codepage
(1256 Arabic, for example), depending on the client locale, etc. You can
determine the code page the client is using (ANSI) by calling Windows API
GetACP(VOID)

Since the server uses UTF8 then when transferring data to an ANSI client
(narrow calls) the driver needs to convert from UTF8 to the codepage
returned by GetACP() call. The conversion of course could fail if the
client codepage has no equivalent to the UTF8 character and in this case
the driver would report a conversion error for only that piece of data.

Similarly for wide clients, the driver would convert from UTF8 to UTF16
(UCS16, actually) and report any conversion errors (less likely).

----------------------------------------------------------------------

Comment By: Sjoerd Mullender (sjoerd)
Date: 2010-04-16 12:17

Message:
What does "the actual character length for a variable-length data type"
mean?  I thought it meant the length of the (longest) actual value in the
column.  Since this is from SQLColAttribute, there are actual values in the
result set to talk about.  From your comment I understand you have a
different interpretation (namely the maximum allowed length for the
column).  Is there anywhere a clearer definition of these attributes?

The definition of SQL_DESC_OCTET_LENGTH is perhaps a little clearer,
although I could interpret that either way as well: "For variable-length
character or binary types, this is the maximum length in bytes".  Which
maximum?  The maximum according to the column definition or the maximum in
the current result set?

In addition to SQL_DESC_LENGTH and SQL_DESC_OCTET_LENGTH, there is also
SQL_DESC_DISPLAY_SIZE which has a similar issue.  Does that value depend on
the actual content of the column (as returned by the query) or does that
depend on the definition of the column?

I can see that the width of the columns returned by SQLTables could depend
on the limitations of the server for the type of information (schema name
length, etc.)  Unfortunately, that doesn't help currently since, as
mentioned, the information is not passed on.

As to ANSI vs. Unicode.  We only support UTF-8 in the narrow calls and
UTF-16 in the wide calls.  As I understand it, the term ANSI refers to a
Windows code page which is close to ISO-8859-1, but in any case *not*
compatible with UTF-8.

I suppose I could change the code so that the narrow interface is CP-1252
and if you want to use characters that don't occur in that code page you're
forced to use the wide interface.  I could also differentiate between the
calls with or without A at the end (e.g. SQLTables vs. SQLTablesA) where
the calls with A use the "ANSI" code page and the calls without use UTF-8. 
But I don't really like this idea.

(My weekend is starting now, so I probably won't respond before Monday.)

----------------------------------------------------------------------

Comment By: Farid Z (faridz)
Date: 2010-04-16 10:16

Message:
>The columns SQLTables returns are all varchar, without mention how large
>they should be
The returned info would represent how the column is stored at the dbms.
That's why the ODBC spec does not specify exact character counts. For DB2
this a table name can be maximum of 128 characters so their drivers returns
128 for SQL_DESC_LENGTH and 128 for SQL_DESC_OCTET_LENGTH for ANS or 128*2
for UTF16.

You can ignore the catalog column of the resultset since MonetDB does not
support catalogs and if the driver correctly reports that catalog are not
supported by the DBMS.

>Another problem is that SQL_DESC_LENGTH counts *characters* and
>SQL_DESC_OCTET_LENGTH counts *bytes*. These are not interchangeable. 
>MonetDB uses the UTF-8 encoding

Example column declared as varchar(30) the driver would return
SQL_DESC_LENGTH -> 30
SQL_DESC_OCTET_LENGTH -> 30 for ANSI version of the driver, 30*2 for
Unicode version of the driver. For Windows the driver must translate from
dbms characterset (UTF8) to client characterset: ANSI (single byte
characterset) or Windows UTF16 (implemented by ODBC spec as wchar_t two
bytes).

>your suggestion of using 128 characters (or is that bytes?) doesn't help
That was an example, not a suggestion. For MontDB the driver needs to
return the actual info for the underlying catalog "views" where the
information for SQLTables is stored. MonetDB must be storing the table
names somewhere in its catalogs.

>There *is*, however, a problem in MonetDB. The server returns a single
column length value, and for varchar >columns that value is the actual
length of the longest value in the column (0 if all values are NULL or if
>there are no rows)
That would be a problem. If I create a column as varchar(30) then I should
get (as explained above) whether or not there is ANY data in the table:
SQL_DESC_LENGTH -> 30
SQL_DESC_OCTET_LENGTH -> 30 for ANSI version of the driver, 30*2 for
Unicode version of the driver

SQL_DESC_LENGTH and SQL_DESC_OCTET_LENGTH are used by the ODBC client to
create and allocate buffers for the resultset data and without correct
information for these attributes, the ODBC client can not correctly
retrieve data from the database



----------------------------------------------------------------------

Comment By: Sjoerd Mullender (sjoerd)
Date: 2010-04-16 09:46

Message:
I read that too, but I'm not sure I can come to the same conclusions.  I'm
struggling to understand what the specification means.

The columns SQLTables returns are all varchar, without mention how large
they should be.  The first column is the catalog name.  The values here can
be NULL if the data source does not support catalogs.  Since MonetDB indeed
doesn't support catalogs, the value returned is indeed NULL.

The description in SQLTables doesn't say what size varchar column to
return, but presumably one that is large enough to hold the data.  In this
case, varchar(1) is plenty long enough (for this particular column).

For SQLColAttribute, the description of SQL_DESC_LENGTH says that, for
variable-length data (which varchar is), the size returned is the "actual
character length" (we can ignore the fixed-length case).  What is the
actual character length of a NULL value?  I can argue 0.

The problem for SQL_DESC_OCTET_LENGTH is that, for variable-length
character types, it returns the *maximum* length in bytes.  Again, what is
the maximum length of a column that is always going to be NULL?  The query
that SQLTables does under the hood uses cast(null as varchar(1)) for the
catalog column.

There *is*, however, a problem in MonetDB.  The server returns a single
column length value, and for varchar columns that value is the actual
length of the longest value in the column (0 if all values are NULL or if
there are no rows).  (The value returned is the declared length for
fixed-length columns, i.e. 50 for char(50).)  In other words, the client
does not know the maximum allowed length of the column if the longest value
is shorter.  So if a column is declared varchar(50) but only contains
NULLs, the value returned by the server is 0.  This means that determining
SQL_DESC_LENGTH is relatively easy, but determining SQL_DESC_OCTET_LENGTH
is not currently possible for varchar columns.

Another problem is that SQL_DESC_LENGTH counts *characters* and
SQL_DESC_OCTET_LENGTH counts *bytes*.  These are not interchangeable. 
MonetDB uses the UTF-8 encoding, which is a variable-length encoding.  A
character can take from 1 to 6 bytes.  Should we then multiply the value by
6?  If a column is declared as e.g. varchar(10), that means that it can
store up to 10 characters in MonetDB.

The problem we're having here doesn't really have anything to do with
SQLTables as such.  The implementation does a SQLExecDirect call under the
hood, and SQLColAttribute doesn't know (or care) how the result set was
created.  The problem has to do with getting metainformation about the
result set.  In other words, your suggestion of using 128 characters (or is
that bytes?) doesn't help.  That information is not passed on to
SQLColAttribute, and anyway, the same problems would exist for hand-created
SQL queries.

I do agree that all metainformation-returning queries should return
compatible information.

----------------------------------------------------------------------

Comment By: Farid Z (faridz)
Date: 2010-04-16 08:43

Message:
The driver needs to return valid values (not 0 and certainly usually
greater than 1) for SQL_DESC_LENGTH and SQL_DESC_OCTET_LENGTH for the
resultset columns as per ODBC descriptions for these.
http://msdn.microsoft.com/en-us/library/ms713558(VS.85).aspx

Quote:
 SQL_DESC_LENGTH: A numeric value that is either the maximum or actual
character length of a character string or binary data type. It is the
maximum character length for a fixed-length data type, or the actual
character length for a variable-length data type. Its value always excludes
the null-termination byte that ends the character string

SQL_DESC_OCTET_LENGTH : 
The length, in bytes, of a character string or binary data type. For
fixed-length character or binary types, this is the actual length in bytes.
For variable-length character or binary types, this is the maximum length
in bytes. This value does not include the null terminator.

So for SQLTables resultset table names these would be for example, 128,
128 assuming that the maximum length for a table name in MonetDB is 128
character. Similar info should be returned by the driver for SQLStatistics,
SQLPrimaryKeys and SQLForeignKesy, SQLGetTypeInfo and all other ODBC
catalog functions for these attributes.

----------------------------------------------------------------------

Comment By: Sjoerd Mullender (sjoerd)
Date: 2010-04-16 04:07

Message:
I have been able to also get the value 0 from the call to SQLColAttribute
on column 1 after a call to SQLTables.
The reason for the value 0 is that (in my case, anyway) all values in the
column are NULL.
If I look at column 2 where I do get a non-NULL value, the value for
SQL_DESC_OCTET_LENGTH and SQL_DESC_LENGTH is indeed greater than 0.
The ODBC speciification just says that the types of the columns are
"varchar" without any size indication.  So if the columns are completely
filled with NULLs, the server just returns a length indication of 0.
I suppose I could arbitrarily change the length to 1 when it would
otherwise have been 0.

----------------------------------------------------------------------

Comment By: Farid Z (faridz)
Date: 2010-04-15 16:42

Message:
I am running the 32-bit version of my app on Windows XP SP3 32-bit and
using Windows characters set 1252 (US-EN).

I am using ODBC types for column_size and buffer_length which in 32-bit
version of my app are SQUINTEGER and SQLINTEGER.

SQLULEN column_size;
SQLLEN          buffer_length;

My app is tested successfully with 30-40 different DBMS ODBC drivers
without encountering this issue http://www.zidsoft.com/dbmsnotes.html

The driver is returning zero for SQL_DESC_OCTET_LENGTH and
SQL_DESC_LENGTH
for ::SQLTables character resultsetset columns instead of actual values.
Same thing for ::SQLPrimaryKeys, ::SQLStatistics and ::SQLForeignKeys
resultsets character columns



----------------------------------------------------------------------

Comment By: Sjoerd Mullender (sjoerd)
Date: 2010-04-15 15:55

Message:
If by ANSI you mean one of the ISO-8859 versions (e.g. ISO-8859-1 aka
"latin 1") then that is definitely a problem.  Those are *not* compatible
with UTF-8.

What are the types of pRow->buffer_length and pRow->column_size?

Is your application 32 bit or 64 bit?

The MonetDB ODBC driver may not be completely compliant with the 64-bit
definition of the ODBC interface.  I'm hoping that can be fixed in the next
feature release (not in a bug fix release since it's an API change).

----------------------------------------------------------------------

Comment By: Farid Z (faridz)
Date: 2010-03-15 13:28

Message:
My app is ANSI so I am using narrow calls. So that's not the cause of this
issue.  I see the cause of the problem: the driver is returning incorrect
information for ::SQLTables character resultset columns:

        rc = SQLColAttribute(hstmt,
                             nCol,
                             SQL_DESC_OCTET_LENGTH,
                             NULL,
                             0,
                             NULL,
                             &pRow->buffer_length);

                        rc = SQLColAttribute(hstmt,
                                                                 nCol,
                                                                 
SQL_DESC_LENGTH,
                                                                 NULL,
                                                                 0,
                                                                 NULL,
                                                                 (SQLLEN*) 
&pRow->column_size);

The driver is returning zero for SQL_DESC_OCTET_LENGTH and SQL_DESC_LENGTH
for ::SQLTables character resultsetset columns instead of actual values.
Same thing for ::SQLPrimaryKeys, ::SQLStatistics and ::SQLForeignKeys
resultsets character columns



----------------------------------------------------------------------

Comment By: Sjoerd Mullender (sjoerd)
Date: 2010-03-15 13:03

Message:
The MonetDB ODBC driver only support Unicode and UTF-8.  That is to say,
the wide character calls are expected to use UCS-16, and the narrow
character calls are expected to use UTF-8.  The driver does not support any
other character sets (well, ASCII since it is a proper subset of UTF-8).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2970087&group_id=56967

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Monetdb-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-bugs

Reply via email to