So this is answer, which i have looked for:
"In Lazarus TStringField MUST hold UTF-8 encoded strings."

Not entirely true. You could also choose to bind the fields to some
Lazarus-components manually, not using the db-components.
IMHO most of gui database applications use controls like TDBGrid or TDBEdit
so they should display correct values by default without extra coding (or at least provide "some standardized support" ... )


 (Tedit.Text :=
convertFunc(StringField.Text)) Or you can add a hook so that the .text
property always does a conversion to UTF-8. First option can be used if
you use a mediator or view. Second options I woudn't use.

Rofl. You mean that Microsoft SQL Server can't handle unicode completely?
Completely not, but only UCS-2 (no UTF-8)

SQL Server provides non-UNICODE datatypes - char, varchar, text

ie: TStringField
Yes, but ODBC driver returns data in ANSI codepage (no possibility to force them return UTF-8)
This I can fix by patch in TODBCConnection LoadField like this:
(so I convert to UTF-8 in connector method, when driver is unable return UTF-8)
   begin
Res:=SQLGetData(ODBCCursor.FSTMTHandle, FieldDef.Index+1, SQL_C_CHAR, buffer, FieldDef.Size, @StrLenOrInd);
+      if CharSet='ANSI' then //hack for Microsoft SQL Server
+        StrPLCopy(buffer, UTF8Encode(PChar(buffer)), FieldDef.Size);
end;
 and UNICODE (UCS-2) datatypes - nchar, nvarchar, ntext

ie: TWideStringField.
Yes, in this case ODBC driver returns data in UCS-2, this data are written into "WideString buffer", which seems correct, but in DBGrid are displayed "?" instead of characters with diacritical marks (IMHO because widestringmanager in Windows converts WideString to ANSI string , not UTF-8 string). This can be fixed by using OnGetText method of field: aText:=UTF8Encode(Sender.AsString); Which is not user friendly, because requires "hacking in user code" in every TWideStringField in every TSQLQuery
It can be also fixed in fields.inc:
function TWideStringField.GetAsString: string;
begin
+{$IFDEF WINDOWS}
+  Result := UTF8Encode(GetAsWideString);
+{$ELSE}
 Result := GetAsWideString;
+{$ENDIF}
end;

So what is the expected encoding of data written into TWideStringField ... or is there way how to get correct results id DBGrid without above mentioned workarounds ?

 SQL Server ODBC driver supports "AutoTranslate", see:
http://msdn.microsoft.com/en-us/library/ms130822.aspx
 "SQL Server char, varchar, or text data sent to a client SQL_C_CHAR
variable is converted from character to Unicode using the server ACP,
then converted from Unicode to character using the client ACP."

This is what you use when you set the encoding when you connect to the
client. The solution to all your problems. As explained three times, in
this message alone.

In fact it's simple: incoming data=outgoing data.

If you need UTF-8 encoding for the outgoing data (direct access to
Lazarus controls) you have to select UTF-8 at the input.
Yes, but as I wrote such possibility does not exists with Microsoft SQL Server (and also I think Access)
(it seems, that Microsoft does not like UTF-8 and prefers UTF-16 (UCS-2))

And, luckily, you can instruct the Database-server which encoding to use
when it's communicating with the outer world. So your problem is solved.
When it is possiblem then yes.

Now, if you also choose UTF-8 as the Database-server field encoding (the
encoding the data is stored in) there's no conversion necessary at all.
Yes if DB supports UTF-8

-Laco.

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to