Alexander/Gabor,
I've reduced this reply to dbi-dev only for now.
Alexander Foken wrote:
I'm impressed how fast things are developing right now - WOW!
Martin Evans wrote:
On the point of the Alexander's unicode patch I seem to remember
applying it over a year a go to my copy of DBD::ODBC but it broke
building of DBD::ODBC on UNIX - perhaps my recollection is wrong.
I would expect my patch to break compiling on every platform except
Win2K/WinXP, because I only tested it on those platforms.
I think my changes are quite Window specific, especially the calls to
the Unicode converter routines WideCharToMultiByte() and
MultiByteToWideChar(). wchar.h, wcslen() and wcscpy() are in the
Single UNIX Specification, Version 2, dated 1997. WCHAR should be
equivalent to wchar_t of the Single UNIX Specification, a #define or
typedef should be sufficient, like Microsoft does in wchar.h. See also
<http://www.alexander-foken.de/README.unicode-patch.html#known_problems>.
With a litte bit more knowlegde of the inner workings of Perl's
Unicode support, it should be possible to replace the two converter
routines called by my patch with Perl build-in routines. When that has
happened, the code should work on platforms implementing the Single
UNIX Specification Version 2, assuming the ODBC system wants Unicode
in UTF-16 like Windows does. If the ODBC system expects UTF-8, the
calls to the two converter functions should be called only on Win32.
Hope that helps,
Alexander
I've got a lot of problems attempting to make this work for UNIX not
least of which is wchar_t on UNIX is typically 4 bytes and the ODBC API
only really does UCS2 (2 bytes) - this rather makes using wcslen etc
rather useless. Then there is the additional issues of the lack of
unicode odbc drivers for UNIX and the ODBC driver manager on UNIX (IBM
have a UCS-2 handling ODBC driver for UNIX - but see
http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/ad/c0011522.htm).
At this point in time I don't believe anyone is using SQL_Wxxx
characters on UNIX via ODBC but I'm prepared to be proved wrong. The
problem is also related to there not being a definite definition of what
unicode in ODBC on UNIX is. If it is (as would seem to be the only
sensible thing for ODBC) taken as UCS2 then it is simply a matter of
converting between UCS2 (in ODBC) and UTF-8 in Perl - any pointers from
anyone here on how to do that would be appreciated.
As for UTF-8 I could never see how this could ever be done with the ODBC
API (on any platform) as the API uses counts of characters in places but
expects buffers sized by bytes e.g. if it comes back with a column is 20
characters in size, how can you tell how many bytes of space you need
for it. Then there are loads of places where it says if something is a
unicode string then the buffers size must be a multiple of 2 etc.
As it would seem a number of people are using your patch for Windows
currently, I've integrated it into DBD::ODBC with the following conditions:
1. all the SQL_Wxxx C code is conditional on compilation on Windows i.e.
in #ifdef WIN32
with the exception of a few harmless places which convert a SQL type to
a string etc.
2. all your new unicode tests are only run on Windows, skipped for other
platforms.
3. There are a few aspects of the patch I am unsure about and ideally
I'd like a comment on them:
a)
/* MS SQL returns bytes, Oracle returns characters ... */
fbh->ColLength*=sizeof(WCHAR);
fbh->ColDisplaySize = DBIc_LongReadLen(imp_sth)+1;
Comment seems to suggest a difference between the two but I don't see a
code difference.
It looks as though the code agrees with comment as far as SQL Server but
not Oracle.
b) In dbd_describe() there is a:
fbh->ColLength += 1; /* add terminator */
in your patch and I'm unclear why that is required.
4. I've only currently tested it on Windows with SQL Server and may need
to do some tidying up for UNIX.
5. I've completed the integration work for the code and tests but not
the other areas like Changes, README etc as yet.
Ideally I'd like some comments on (3) first but then I could commit this
to subversion next week and perhaps some of the people already using
your patch could try it out. By the time we get to that stage I hope to
be able to come up with Perl equivalents of the mbs2utf8 etc functions.
Martin