On Mon, 24 Jun 2013 16:47:32 +0100 Mateusz Loskot <[email protected]> wrote:

ML> > hs> But there is a global issue doing so. I need the extends the core with
ML> > hs> support of
ML> > hs>
ML> > hs> #define SQL_WCHAR             (-8)
ML> > hs> #define SQL_WVARCHAR         (-9)
ML> > hs>
ML> > hs> with a new core data_type
ML> > hs>
ML> > hs>       dt_wstring
ML> > hs>
ML> > hs> and exchanged to
ML> > hs>
ML> > hs>      std::wstring.
ML> >
ML> >  This would be already useful and AFAICS shouldn't be a big problem to
ML> > implement but I think it would be really useful to also allow exchange
ML> > between Unicode database fields and UTF-8-encoded std::string (no "w").
ML> > What do you think?
ML> 
ML> UTF-8 is Unicode, but I guess it's a shortcut.

 Yes, sorry, I used "Unicode" in Windows sense of the word where it
typically means "wchar_t" (i.e. UTF-16).

ML> Do you mean a field of UTF-8 and select into(std::wstring)
ML> or field of UTF-16 and insert with use(std::string) ?

 Both. I.e. IMO ideally any text database fields, be they UTF-8-encoded,
UTF-16-encoded or whatever else[*], should be exchangeable with both
std::string and std::wstring.

ML> In fact, currently, we somewhat implicitly assume all std::sting are UTF-8.

 I think it's a perfectly reasonable assumption to make nowadays. It just
needs to be explicit IMO.

ML> In limited scenario, we can assume only MSSQL/ODBC on Windows
ML> exchange UTF-16 and all the rest exchange UTF-8, then we know
ML> statically what conversion to apply.
ML> 
ML> The problem is if we want to support conversions like:
ML> anything <->UTF-8
ML> anything <->UTF-16
ML> 
ML> Then we have to take care of checking what lingo database/client/backend
ML> speak per session.  Don't we?

 If we want to support arbitrary encodings, then we definitely need to do
this, but I don't think we need to do this right now. I'd like to determine
the kind of text based solely on the column type. E.g. for MS SQL we have
NCHAR/NVARCHAR/NTEXT which are always UTF-16 (AFAIK) and CHAR/VARCHAR/TEXT
which always some multibyte encoding. Again, I think this is actually the
only one requiring special treatment, the others all use UTF-8 or another
multibyte encoding -- which we don't support right now.

 Of course, it would be ideal to support all the different encodings too.
But, again, I don't think anybody needs this right now and doing this would
be much more complicated as it would require querying the database
charset/encoding in each session and doing the conversions (which would in
turn probably require linking with ICU).

 So my suggestion would be:

1. Handle only UTF-8 for multibyte encodings right now and, perhaps, throw
   an error if we can detect that the database uses anything else.
   Formalize this by documenting that std::string used by SOCI is supposed
   to always be in UTF-8.

2. Add UTF-16 support for MS SQL and ODBC backends by converting data
   to/from UTF-8.

3. Add support for exchanging data with std::wstring. Do it directly for
   MS SQL/ODBC or via UTF-8 for all the others.

 And the point I was trying to make in my original reply was that IMHO it's
the step (2) that is the most interesting, not so much the step (3) (even
if it would be useful to have it too, but I think any C++ programmer
already uses some library allowing him to easily convert between
UTF-8-encoded std::string and std::wstring anyhow).

 Regards,
VZ

Attachment: pgpW1IneO3Akh.pgp
Description: PGP signature

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
soci-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/soci-users

Reply via email to