Re: [fpc-devel] TStringField, String and UnicodeString and UTF8String

LacaK Fri, 14 Jan 2011 01:55:31 -0800

So this is answer, which i have looked for:
"In Lazarus TStringField MUST hold UTF-8 encoded strings."


Not entirely true. You could also choose to bind the fields to some
Lazarus-components manually, not using the db-components.

IMHO most of gui database applications use controls like TDBGrid or TDBEdit

so they should display correct values by default without extra coding(or at least provide "some standardized support" ... )

 (Tedit.Text :=
convertFunc(StringField.Text)) Or you can add a hook so that the .text
property always does a conversion to UTF-8. First option can be used if
you use a mediator or view. Second options I woudn't use.

Rofl. You mean that Microsoft SQL Server can't handle unicodecompletely?

Completely not, but only UCS-2 (no UTF-8)

SQL Server provides non-UNICODE datatypes - char, varchar, text
ie: TStringField

Yes, but ODBC driver returns data in ANSI codepage (no possibility toforce them return UTF-8)

This I can fix by patch in TODBCConnection LoadField like this:

(so I convert to UTF-8 in connector method, when driver is unable returnUTF-8)

   begin

Res:=SQLGetData(ODBCCursor.FSTMTHandle, FieldDef.Index+1,SQL_C_CHAR, buffer, FieldDef.Size, @StrLenOrInd);

+      if CharSet='ANSI' then //hack for Microsoft SQL Server
+        StrPLCopy(buffer, UTF8Encode(PChar(buffer)), FieldDef.Size);

end;

 and UNICODE (UCS-2) datatypes - nchar, nvarchar, ntext


ie: TWideStringField.

Yes, in this case ODBC driver returns data in UCS-2, this data arewritten into "WideString buffer", which seems correct, but in DBGrid aredisplayed "?" instead of characters with diacritical marks (IMHO becausewidestringmanager in Windows converts WideString to ANSI string , notUTF-8 string).This can be fixed by using OnGetText method of field:aText:=UTF8Encode(Sender.AsString);Which is not user friendly, because requires "hacking in user code" inevery TWideStringField in every TSQLQuery

It can be also fixed in fields.inc:
function TWideStringField.GetAsString: string;
begin
+{$IFDEF WINDOWS}
+  Result := UTF8Encode(GetAsWideString);
+{$ELSE}
 Result := GetAsWideString;
+{$ENDIF}
end;

So what is the expected encoding of data written into TWideStringField... or is there way how to get correct results id DBGrid without abovementioned workarounds ?

 SQL Server ODBC driver supports "AutoTranslate", see:
http://msdn.microsoft.com/en-us/library/ms130822.aspx
 "SQL Server char, varchar, or text data sent to a client SQL_C_CHAR
variable is converted from character to Unicode using the server ACP,
then converted from Unicode to character using the client ACP."


This is what you use when you set the encoding when you connect to the
client. The solution to all your problems. As explained three times, in
this message alone.

In fact it's simple: incoming data=outgoing data.

If you need UTF-8 encoding for the outgoing data (direct access to
Lazarus controls) you have to select UTF-8 at the input.

Yes, but as I wrote such possibility does not exists with Microsoft SQLServer (and also I think Access)

(it seems, that Microsoft does not like UTF-8 and prefers UTF-16 (UCS-2))

And, luckily, you can instruct the Database-server which encoding to use
when it's communicating with the outer world. So your problem is solved.

When it is possiblem then yes.

Now, if you also choose UTF-8 as the Database-server field encoding (the
encoding the data is stored in) there's no conversion necessary at all.

Yes if DB supports UTF-8

-Laco.

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] TStringField, String and UnicodeString and UTF8String

Reply via email to