Re: next version of DBD::ODBC including available unicode patch?

Martin J. Evans Sat, 30 Jun 2007 08:32:46 -0700

Alexander/Gabor,

I've reduced this reply to dbi-dev only for now.


Alexander Foken wrote:

I'm impressed how fast things are developing right now - WOW!

Martin Evans wrote:
On the point of the Alexander's unicode patch I seem to rememberapplying it over a year a go to my copy of DBD::ODBC but it brokebuilding of DBD::ODBC on UNIX - perhaps my recollection is wrong.
I would expect my patch to break compiling on every platform exceptWin2K/WinXP, because I only tested it on those platforms.
I think my changes are quite Window specific, especially the calls tothe Unicode converter routines WideCharToMultiByte() andMultiByteToWideChar(). wchar.h, wcslen() and wcscpy() are in theSingle UNIX Specification, Version 2, dated 1997. WCHAR should beequivalent to wchar_t of the Single UNIX Specification, a #define ortypedef should be sufficient, like Microsoft does in wchar.h. See also<http://www.alexander-foken.de/README.unicode-patch.html#known_problems>.With a litte bit more knowlegde of the inner workings of Perl'sUnicode support, it should be possible to replace the two converterroutines called by my patch with Perl build-in routines. When that hashappened, the code should work on platforms implementing the SingleUNIX Specification Version 2, assuming the ODBC system wants Unicodein UTF-16 like Windows does. If the ODBC system expects UTF-8, thecalls to the two converter functions should be called only on Win32.
Hope that helps,
Alexander

I've got a lot of problems attempting to make this work for UNIX notleast of which is wchar_t on UNIX is typically 4 bytes and the ODBC APIonly really does UCS2 (2 bytes) - this rather makes using wcslen etcrather useless. Then there is the additional issues of the lack ofunicode odbc drivers for UNIX and the ODBC driver manager on UNIX (IBMhave a UCS-2 handling ODBC driver for UNIX - but seehttp://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/ad/c0011522.htm).At this point in time I don't believe anyone is using SQL_Wxxxcharacters on UNIX via ODBC but I'm prepared to be proved wrong. Theproblem is also related to there not being a definite definition of whatunicode in ODBC on UNIX is. If it is (as would seem to be the onlysensible thing for ODBC) taken as UCS2 then it is simply a matter ofconverting between UCS2 (in ODBC) and UTF-8 in Perl - any pointers fromanyone here on how to do that would be appreciated.

As for UTF-8 I could never see how this could ever be done with the ODBCAPI (on any platform) as the API uses counts of characters in places butexpects buffers sized by bytes e.g. if it comes back with a column is 20characters in size, how can you tell how many bytes of space you needfor it. Then there are loads of places where it says if something is aunicode string then the buffers size must be a multiple of 2 etc.

As it would seem a number of people are using your patch for Windowscurrently, I've integrated it into DBD::ODBC with the following conditions:

1. all the SQL_Wxxx C code is conditional on compilation on Windows i.e.in #ifdef WIN32with the exception of a few harmless places which convert a SQL type toa string etc.

2. all your new unicode tests are only run on Windows, skipped for otherplatforms.

3. There are a few aspects of the patch I am unsure about and ideallyI'd like a comment on them:


a)
/* MS SQL returns bytes, Oracle returns characters ... */
fbh->ColLength*=sizeof(WCHAR);
fbh->ColDisplaySize = DBIc_LongReadLen(imp_sth)+1;

Comment seems to suggest a difference between the two but I don't see acode difference.It looks as though the code agrees with comment as far as SQL Server butnot Oracle.


b) In dbd_describe() there is a:
     fbh->ColLength += 1; /* add terminator */

in your patch and I'm unclear why that is required.

4. I've only currently tested it on Windows with SQL Server and may needto do some tidying up for UNIX.

5. I've completed the integration work for the code and tests but notthe other areas like Changes, README etc as yet.

Ideally I'd like some comments on (3) first but then I could commit thisto subversion next week and perhaps some of the people already usingyour patch could try it out. By the time we get to that stage I hope tobe able to come up with Perl equivalents of the mbs2utf8 etc functions.


Martin

Re: next version of DBD::ODBC including available unicode patch?

Reply via email to