Alexander/Gabor,

I've reduced this reply to dbi-dev only for now.

Alexander Foken wrote:
I'm impressed how fast things are developing right now - WOW!

Martin Evans wrote:
On the point of the Alexander's unicode patch I seem to remember applying it over a year a go to my copy of DBD::ODBC but it broke building of DBD::ODBC on UNIX - perhaps my recollection is wrong.

I would expect my patch to break compiling on every platform except Win2K/WinXP, because I only tested it on those platforms.

I think my changes are quite Window specific, especially the calls to the Unicode converter routines WideCharToMultiByte() and MultiByteToWideChar(). wchar.h, wcslen() and wcscpy() are in the Single UNIX Specification, Version 2, dated 1997. WCHAR should be equivalent to wchar_t of the Single UNIX Specification, a #define or typedef should be sufficient, like Microsoft does in wchar.h. See also <http://www.alexander-foken.de/README.unicode-patch.html#known_problems>. With a litte bit more knowlegde of the inner workings of Perl's Unicode support, it should be possible to replace the two converter routines called by my patch with Perl build-in routines. When that has happened, the code should work on platforms implementing the Single UNIX Specification Version 2, assuming the ODBC system wants Unicode in UTF-16 like Windows does. If the ODBC system expects UTF-8, the calls to the two converter functions should be called only on Win32.

Hope that helps,
Alexander

I've got a lot of problems attempting to make this work for UNIX not least of which is wchar_t on UNIX is typically 4 bytes and the ODBC API only really does UCS2 (2 bytes) - this rather makes using wcslen etc rather useless. Then there is the additional issues of the lack of unicode odbc drivers for UNIX and the ODBC driver manager on UNIX (IBM have a UCS-2 handling ODBC driver for UNIX - but see http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/ad/c0011522.htm). At this point in time I don't believe anyone is using SQL_Wxxx characters on UNIX via ODBC but I'm prepared to be proved wrong. The problem is also related to there not being a definite definition of what unicode in ODBC on UNIX is. If it is (as would seem to be the only sensible thing for ODBC) taken as UCS2 then it is simply a matter of converting between UCS2 (in ODBC) and UTF-8 in Perl - any pointers from anyone here on how to do that would be appreciated.

As for UTF-8 I could never see how this could ever be done with the ODBC API (on any platform) as the API uses counts of characters in places but expects buffers sized by bytes e.g. if it comes back with a column is 20 characters in size, how can you tell how many bytes of space you need for it. Then there are loads of places where it says if something is a unicode string then the buffers size must be a multiple of 2 etc.

As it would seem a number of people are using your patch for Windows currently, I've integrated it into DBD::ODBC with the following conditions:

1. all the SQL_Wxxx C code is conditional on compilation on Windows i.e. in #ifdef WIN32 with the exception of a few harmless places which convert a SQL type to a string etc.

2. all your new unicode tests are only run on Windows, skipped for other platforms.

3. There are a few aspects of the patch I am unsure about and ideally I'd like a comment on them:

a)
/* MS SQL returns bytes, Oracle returns characters ... */
fbh->ColLength*=sizeof(WCHAR);
fbh->ColDisplaySize = DBIc_LongReadLen(imp_sth)+1;

Comment seems to suggest a difference between the two but I don't see a code difference. It looks as though the code agrees with comment as far as SQL Server but not Oracle.

b) In dbd_describe() there is a:
     fbh->ColLength += 1; /* add terminator */

in your patch and I'm unclear why that is required.

4. I've only currently tested it on Windows with SQL Server and may need to do some tidying up for UNIX.

5. I've completed the integration work for the code and tests but not the other areas like Changes, README etc as yet.

Ideally I'd like some comments on (3) first but then I could commit this to subversion next week and perhaps some of the people already using your patch could try it out. By the time we get to that stage I hope to be able to come up with Perl equivalents of the mbs2utf8 etc functions.

Martin

Reply via email to