Re: [lazarus] Non UTF-8 Encodings

Mattias Gaertner Sun, 25 Nov 2007 06:44:23 -0800

On Sun, 25 Nov 2007 15:17:54 +0100
"Felipe Monteiro de Carvalho" <[EMAIL PROTECTED]> wrote:


> On Nov 25, 2007 1:42 PM, Mattias Gaertner <[EMAIL PROTECTED]>
> wrote:
> > Thanks.
> > TStringlist is too slow and the conversion should not be part of
> > codetools. The widgetsets have the best access to the system libs,
> > so they have the best encoding converters. I can do that part.
> 
> Isn't AnsiToUtf8 enougth? This way we don't depend on any external
> libs.

AnsiToUtf8 can be used to convert from system encoding to UTF-8. And
there is no problem to depend on external libs if the widgetset already
depend on them.

 
> A simple loop using ReadLn and WriteLn should be enougth.

I guess not, but that's an implementation details.

 
> > The difficult part is to find out the encoding of a text file.
> > Your proposal of the //&encoding comment has the advantage of
> > comments. Because this is a lazarus feature I suggest to use IDE
> > directives style comments: {%encoding xxx}.
> 
> The best option would be using the standard way which any editor
> should recognize: BOM

BOM is only for UTF.
There are plenty of files in UTF-8 without BOM.
See below.
 
> But we can't use that until the widestring manager is improved.
> 
> Using a comment will cause trouble opening the file on other editors
> which are prepared to handle the standard.

That's a normal problem for all text editors and text files without
encoding info.

If lazarus starts to add the BOM for new UTF-8 files, then FPC should
not convert the string constants.

BTW, what about file functions under windows?


Mattias
 

_________________________________________________________________
     To unsubscribe: mail [EMAIL PROTECTED] with
                "unsubscribe" as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives

Re: [lazarus] Non UTF-8 Encodings

Reply via email to