On 24 June 2013 17:14, Vadim Zeitlin <[email protected]> wrote: > On Mon, 24 Jun 2013 16:47:32 +0100 Mateusz Loskot <[email protected]> wrote: > > ML> Do you mean a field of UTF-8 and select into(std::wstring) > ML> or field of UTF-16 and insert with use(std::string) ? > > Both. I.e. IMO ideally any text database fields, be they UTF-8-encoded, > UTF-16-encoded or whatever else[*], should be exchangeable with both > std::string and std::wstring.
Yes, that's the ideal. > ML> In fact, currently, we somewhat implicitly assume all std::sting are > UTF-8. > > I think it's a perfectly reasonable assumption to make nowadays. It just > needs to be explicit IMO. Right. Do you mean documentation or implementation is lacking too? > ML> In limited scenario, we can assume only MSSQL/ODBC on Windows > ML> exchange UTF-16 and all the rest exchange UTF-8, then we know > ML> statically what conversion to apply. > ML> > ML> The problem is if we want to support conversions like: > ML> anything <->UTF-8 > ML> anything <->UTF-16 > ML> > ML> Then we have to take care of checking what lingo database/client/backend > ML> speak per session. Don't we? > > If we want to support arbitrary encodings, then we definitely need to do > this, but I don't think we need to do this right now. I agree straight away. > I'd like to determine the kind of text based solely on the column type. E.g. > for MS SQL we have > NCHAR/NVARCHAR/NTEXT which are always UTF-16 (AFAIK) and CHAR/VARCHAR/TEXT > which always some multibyte encoding. Again, I think this is actually the > only one requiring special treatment, the others all use UTF-8 or another > multibyte encoding -- which we don't support right now. If we can rely on column types, then it seems we're halfway there, indeed. I'm not experiencd with types in SQL Server, so I missed that. Regarding multi byte encoding, we just take what we get and hand over to client or database, using std::string as array of bytes. But then, we can not ofer reliable conversion between narrow and wide characters, of those multi-byte encoded strings, of course. > Of course, it would be ideal to support all the different encodings too. > But, again, I don't think anybody needs this right now and doing this would > be much more complicated as it would require querying the database > charset/encoding in each session and doing the conversions (which would in > turn probably require linking with ICU). (Boost.Locale may be interesting too.) > So my suggestion would be: > > 1. Handle only UTF-8 for multibyte encodings right now and, perhaps, throw > an error if we can detect that the database uses anything else. > Formalize this by documenting that std::string used by SOCI is supposed > to always be in UTF-8. Yes and this answers my question from the beginning. > 2. Add UTF-16 support for MS SQL and ODBC backends by converting data > to/from UTF-8. Yes. > 3. Add support for exchanging data with std::wstring. Do it directly for > MS SQL/ODBC or via UTF-8 for all the others. Yes. > And the point I was trying to make in my original reply was that IMHO it's > the step (2) that is the most interesting, not so much the step (3) (even > if it would be useful to have it too, but I think any C++ programmer > already uses some library allowing him to easily convert between > UTF-8-encoded std::string and std::wstring anyhow). You are right. Also, being not able to throw solid amount of manpower, let's make smallest steps possible. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ soci-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/soci-users
