On Mon, 24 Jun 2013 16:47:32 +0100 Mateusz Loskot <[email protected]> wrote:
ML> > hs> But there is a global issue doing so. I need the extends the core with ML> > hs> support of ML> > hs> ML> > hs> #define SQL_WCHAR (-8) ML> > hs> #define SQL_WVARCHAR (-9) ML> > hs> ML> > hs> with a new core data_type ML> > hs> ML> > hs> dt_wstring ML> > hs> ML> > hs> and exchanged to ML> > hs> ML> > hs> std::wstring. ML> > ML> > This would be already useful and AFAICS shouldn't be a big problem to ML> > implement but I think it would be really useful to also allow exchange ML> > between Unicode database fields and UTF-8-encoded std::string (no "w"). ML> > What do you think? ML> ML> UTF-8 is Unicode, but I guess it's a shortcut. Yes, sorry, I used "Unicode" in Windows sense of the word where it typically means "wchar_t" (i.e. UTF-16). ML> Do you mean a field of UTF-8 and select into(std::wstring) ML> or field of UTF-16 and insert with use(std::string) ? Both. I.e. IMO ideally any text database fields, be they UTF-8-encoded, UTF-16-encoded or whatever else[*], should be exchangeable with both std::string and std::wstring. ML> In fact, currently, we somewhat implicitly assume all std::sting are UTF-8. I think it's a perfectly reasonable assumption to make nowadays. It just needs to be explicit IMO. ML> In limited scenario, we can assume only MSSQL/ODBC on Windows ML> exchange UTF-16 and all the rest exchange UTF-8, then we know ML> statically what conversion to apply. ML> ML> The problem is if we want to support conversions like: ML> anything <->UTF-8 ML> anything <->UTF-16 ML> ML> Then we have to take care of checking what lingo database/client/backend ML> speak per session. Don't we? If we want to support arbitrary encodings, then we definitely need to do this, but I don't think we need to do this right now. I'd like to determine the kind of text based solely on the column type. E.g. for MS SQL we have NCHAR/NVARCHAR/NTEXT which are always UTF-16 (AFAIK) and CHAR/VARCHAR/TEXT which always some multibyte encoding. Again, I think this is actually the only one requiring special treatment, the others all use UTF-8 or another multibyte encoding -- which we don't support right now. Of course, it would be ideal to support all the different encodings too. But, again, I don't think anybody needs this right now and doing this would be much more complicated as it would require querying the database charset/encoding in each session and doing the conversions (which would in turn probably require linking with ICU). So my suggestion would be: 1. Handle only UTF-8 for multibyte encodings right now and, perhaps, throw an error if we can detect that the database uses anything else. Formalize this by documenting that std::string used by SOCI is supposed to always be in UTF-8. 2. Add UTF-16 support for MS SQL and ODBC backends by converting data to/from UTF-8. 3. Add support for exchanging data with std::wstring. Do it directly for MS SQL/ODBC or via UTF-8 for all the others. And the point I was trying to make in my original reply was that IMHO it's the step (2) that is the most interesting, not so much the step (3) (even if it would be useful to have it too, but I think any C++ programmer already uses some library allowing him to easily convert between UTF-8-encoded std::string and std::wstring anyhow). Regards, VZ
pgpW1IneO3Akh.pgp
Description: PGP signature
------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev
_______________________________________________ soci-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/soci-users
