On 24 June 2013 17:14, Vadim Zeitlin <[email protected]> wrote:
> On Mon, 24 Jun 2013 16:47:32 +0100 Mateusz Loskot <[email protected]> wrote:
>
> ML> Do you mean a field of UTF-8 and select into(std::wstring)
> ML> or field of UTF-16 and insert with use(std::string) ?
>
>  Both. I.e. IMO ideally any text database fields, be they UTF-8-encoded,
> UTF-16-encoded or whatever else[*], should be exchangeable with both
> std::string and std::wstring.

Yes, that's the ideal.

> ML> In fact, currently, we somewhat implicitly assume all std::sting are 
> UTF-8.
>
>  I think it's a perfectly reasonable assumption to make nowadays. It just
> needs to be explicit IMO.

Right. Do you mean documentation or implementation is lacking too?

> ML> In limited scenario, we can assume only MSSQL/ODBC on Windows
> ML> exchange UTF-16 and all the rest exchange UTF-8, then we know
> ML> statically what conversion to apply.
> ML>
> ML> The problem is if we want to support conversions like:
> ML> anything <->UTF-8
> ML> anything <->UTF-16
> ML>
> ML> Then we have to take care of checking what lingo database/client/backend
> ML> speak per session.  Don't we?
>
>  If we want to support arbitrary encodings, then we definitely need to do
> this, but I don't think we need to do this right now.

I agree straight away.

> I'd like to determine the kind of text based solely on the column type. E.g. 
> for MS SQL we have
> NCHAR/NVARCHAR/NTEXT which are always UTF-16 (AFAIK) and CHAR/VARCHAR/TEXT
> which always some multibyte encoding. Again, I think this is actually the
> only one requiring special treatment, the others all use UTF-8 or another
> multibyte encoding -- which we don't support right now.

If we can rely on column types, then it seems we're halfway there, indeed.
I'm not experiencd with types in SQL Server, so I missed that.

Regarding multi byte encoding, we just take what we get and hand over to
client or database, using std::string as array of bytes.
But then, we can not ofer reliable conversion between narrow and wide
characters, of those multi-byte encoded strings, of course.

> Of course, it would be ideal to support all the different encodings too.
> But, again, I don't think anybody needs this right now and doing this would
> be much more complicated as it would require querying the database
> charset/encoding in each session and doing the conversions (which would in
> turn probably require linking with ICU).

(Boost.Locale may be interesting too.)


>  So my suggestion would be:
>
> 1. Handle only UTF-8 for multibyte encodings right now and, perhaps, throw
>    an error if we can detect that the database uses anything else.
>    Formalize this by documenting that std::string used by SOCI is supposed
>    to always be in UTF-8.

Yes and this answers my question from the beginning.

> 2. Add UTF-16 support for MS SQL and ODBC backends by converting data
>    to/from UTF-8.

Yes.

> 3. Add support for exchanging data with std::wstring. Do it directly for
>    MS SQL/ODBC or via UTF-8 for all the others.

Yes.

>  And the point I was trying to make in my original reply was that IMHO it's
> the step (2) that is the most interesting, not so much the step (3) (even
> if it would be useful to have it too, but I think any C++ programmer
> already uses some library allowing him to easily convert between
> UTF-8-encoded std::string and std::wstring anyhow).

You are right. Also, being not able to throw solid amount of manpower,
let's make smallest steps possible.

Best regards,
--
Mateusz  Loskot, http://mateusz.loskot.net

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
soci-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/soci-users

Reply via email to