Not a model of clarity (ANSI and Unicode are not encodings), but this page seems to be the best resource on this:
https://msdn.microsoft.com/en-us/library/ms709439(v=vs.85).aspx It seems that there's a parallel "Unicode" API for ODBC drivers that support it. Moreover: Currently, the only Unicode encoding that ODBC supports is UCS-2, which > uses a 16-bit integer (fixed length) to represent a character. Unicode > allows applications to work in different languages. So using Klingon is off the table. Although the design of UTF-16 is such that sending UTF-16 to an application that expects UCS-2 will probably work reasonably well, as long as it treats it as "just data". This still doesn't explain why some drivers are accepting UCS-2/UTF-16 when called with the non-Unicode API. On Thu, Feb 4, 2016 at 1:12 PM, Stefan Karpinski <[email protected]> wrote: > The real issue is this: > > SQLCHAR is for encodings with 8-bit code units. > > > Condescending lecture on encodings notwithstanding, UTF-16 is not such an > encoding, yet UTF-16 is what the ODBC package is currently sending > to SQLExecDirect for an argument of type SQLCHAR * – and somehow it seems > to be working for many drivers, which still makes no sense to me. I can > only conclude that some ODBC drivers are treating this as a void * argument > and they expect pointers to data in whatever encoding they prefer, not > specifically in encodings with 8-bit code units. > > Querying the database about what encoding it expects is a good idea, but > how does one do that? The SQLGetInfo > <https://msdn.microsoft.com/en-us/library/ms711681(v=vs.85).aspx> > function seems like a good candidate but this page doesn't include > "encoding" or "utf" anywhere. > > On Thu, Feb 4, 2016 at 7:53 AM, Milan Bouchet-Valat <[email protected]> > wrote: > >> Le mercredi 03 février 2016 à 11:44 -0800, Terry Seaward a écrit : >> > From R, it seems like the encoding is based on the connection (as >> > opposed to being hard coded). See `enc <- attr(channel, "encoding")` >> > below: >> > >> > ``` >> > [...] >> > >> > Digging down `odbcConnect` is just a wrapper for `odbcDriverConnect` >> > which has the following parameter `DBMSencoding = ""`. This calls the >> > `C` function `C_RODBCDriverConnect` (available here:RODBC_1.3- >> > 12.tar.gz), which has no reference to encodings. So `attr(channel, >> > "encoding")` is simply `DBMSencoding`, i.e. `""`. >> > >> > It seems to come down to `iconv(..., to = "")` which, from the R >> > source code, uses `win_iconv.c` attached. I can't seem to find how >> > `""` is handled, i.e. is there some default value based on the >> > system? >> "" refers to the encoding of the current system locale. This is a >> reasonable guess, but it will probably be wrong in many cases (else, R >> wouldn't have provided this option at all). >> >> >> Regards >> > >
