Re: [lazarus] Non UTF-8 Encodings

2007-11-29 Thread Razvan Adrian Bogdan
On Nov 27, 2007 11:42 PM, Jesus Reyes [EMAIL PROTECTED] wrote:
 The other day I built a tool that takes the txt tables and convert
 them to pascal constant arrays attached is the resulting inc file if
 needed for something. If I knew something like this ucmaps/charset
 exists I probably wouldn't do it (It never occured to me to look into
 fpc TBH).


Can't we just borrow from iconv like synapse synachar unit ?

Razvan

_
 To unsubscribe: mail [EMAIL PROTECTED] with
unsubscribe as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives


Re: [lazarus] Non UTF-8 Encodings

2007-11-27 Thread Marco van de Voort
On Tue, Nov 27, 2007 at 09:42:54AM +0200, Razvan Adrian Bogdan wrote:
 What about the Synapse conversion units ? There is a unit, synachar if
 i remember correctly that dragged a lot of charsets and codepages
 internally and converts wtthout iconv support for most of them and
 uses iconv if the encoding is not supported internally, we could write
 our own such unit using the idea from synapse, i think they borrowed
 some conversion tables from iconv.

See charset unit? Or what else uses rtl/ucmaps ?

_
 To unsubscribe: mail [EMAIL PROTECTED] with
unsubscribe as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives


Re: [lazarus] Non UTF-8 Encodings

2007-11-27 Thread Razvan Adrian Bogdan
On Nov 27, 2007 11:04 AM, Marco van de Voort [EMAIL PROTECTED] wrote:
 See charset unit? Or what else uses rtl/ucmaps ?

I think i saw it before, but didn't see any use for it, i can't seem
to find any conversion tables just a strange (ugly naming, case,
types, logic) skeleton to something that isn't implemented.

_
 To unsubscribe: mail [EMAIL PROTECTED] with
unsubscribe as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives


Re: [lazarus] Non UTF-8 Encodings

2007-11-27 Thread Mattias Gaertner
On Tue, 27 Nov 2007 12:40:35 +0200
Razvan Adrian Bogdan [EMAIL PROTECTED] wrote:

 On Nov 27, 2007 11:04 AM, Marco van de Voort [EMAIL PROTECTED] wrote:
  See charset unit? Or what else uses rtl/ucmaps ?
 
 I think i saw it before, but didn't see any use for it, i can't seem
 to find any conversion tables just a strange (ugly naming, case,
 types, logic) skeleton to something that isn't implemented.

I think too, the names should be made more readable.
It is the right direction, because it is in the RTL and does not
use any slow ansistrings/widestrings.
So for a platform independent, pure pascal solution this could be used
(using the tables of synaser).
But I don't know how complete these tables are and how much work it is
to maintain them.
Nevertheless it should given a try.

Mattias

_
 To unsubscribe: mail [EMAIL PROTECTED] with
unsubscribe as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives


Re: [lazarus] Non UTF-8 Encodings

2007-11-27 Thread Jesus Reyes

--- Mattias Gaertner [EMAIL PROTECTED] escribió:

 On Tue, 27 Nov 2007 12:40:35 +0200
 Razvan Adrian Bogdan [EMAIL PROTECTED] wrote:
 
  On Nov 27, 2007 11:04 AM, Marco van de Voort [EMAIL PROTECTED]
 wrote:
   See charset unit? Or what else uses rtl/ucmaps ?
  
  I think i saw it before, but didn't see any use for it, i can't
 seem
  to find any conversion tables just a strange (ugly naming, case,
  types, logic) skeleton to something that isn't implemented.
 
 I think too, the names should be made more readable.
 It is the right direction, because it is in the RTL and does not
 use any slow ansistrings/widestrings.
 So for a platform independent, pure pascal solution this could be
 used
 (using the tables of synaser).
 But I don't know how complete these tables are and how much work it
 is
 to maintain them.
 Nevertheless it should given a try.
 
 Mattias

The other day I built a tool that takes the txt tables and convert
them to pascal constant arrays attached is the resulting inc file if
needed for something. If I knew something like this ucmaps/charset
exists I probably wouldn't do it (It never occured to me to look into
fpc TBH). 

Jesus Reyes A.



  Comparte video en la ventana de tus mensajes (y también tus fotos de 
Flickr). 
Usa el nuevo Yahoo! Messenger versión Beta.
http://mx.beta.messenger.yahoo.com/

encodings.inc.gz
Description: 689895700-encodings.inc.gz


Re: [lazarus] Non UTF-8 Encodings

2007-11-26 Thread Razvan Adrian Bogdan
What about the Synapse conversion units ? There is a unit, synachar if
i remember correctly that dragged a lot of charsets and codepages
internally and converts wtthout iconv support for most of them and
uses iconv if the encoding is not supported internally, we could write
our own such unit using the idea from synapse, i think they borrowed
some conversion tables from iconv.

Razvan

_
 To unsubscribe: mail [EMAIL PROTECTED] with
unsubscribe as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives


Re: [lazarus] Non UTF-8 Encodings

2007-11-25 Thread Vasily I. Volchenko
 - When a TCodeBuffer loads a file it must find out the encoding,
 convert it to UTF-8 and on saving convert it back.
OK, this patch may be a start. Course, lconv should be either extended or 
changed to something more suitable.
Presenlty I am checking only the first line


mycp.patch.gz
Description: application/gzip


Re: [lazarus] Non UTF-8 Encodings

2007-11-25 Thread Mattias Gaertner
On Sun, 25 Nov 2007 15:03:18 +0300
Vasily I. Volchenko [EMAIL PROTECTED] wrote:

  - When a TCodeBuffer loads a file it must find out the encoding,
  convert it to UTF-8 and on saving convert it back.
 OK, this patch may be a start. Course, lconv should be either
 extended or changed to something more suitable. Presenlty I am
 checking only the first line

Thanks.
TStringlist is too slow and the conversion should not be part of
codetools. The widgetsets have the best access to the system libs, so
they have the best encoding converters. I can do that part.

The difficult part is to find out the encoding of a text file.
Your proposal of the //encoding comment has the advantage of comments.
Because this is a lazarus feature I suggest to use IDE directives
style comments: {%encoding xxx}.

What about all the existing pascal files (e.g. coming from
delphi)?


Mattias

_
 To unsubscribe: mail [EMAIL PROTECTED] with
unsubscribe as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives


Re: [lazarus] Non UTF-8 Encodings

2007-11-25 Thread Felipe Monteiro de Carvalho
On Nov 25, 2007 1:42 PM, Mattias Gaertner [EMAIL PROTECTED] wrote:
 Thanks.
 TStringlist is too slow and the conversion should not be part of
 codetools. The widgetsets have the best access to the system libs, so
 they have the best encoding converters. I can do that part.

Isn't AnsiToUtf8 enougth? This way we don't depend on any external libs.

A simple loop using ReadLn and WriteLn should be enougth.

 The difficult part is to find out the encoding of a text file.
 Your proposal of the //encoding comment has the advantage of comments.
 Because this is a lazarus feature I suggest to use IDE directives
 style comments: {%encoding xxx}.

The best option would be using the standard way which any editor
should recognize: BOM

But we can't use that until the widestring manager is improved.

Using a comment will cause trouble opening the file on other editors
which are prepared to handle the standard.

-- 
Felipe Monteiro de Carvalho

_
 To unsubscribe: mail [EMAIL PROTECTED] with
unsubscribe as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives


Re: [lazarus] Non UTF-8 Encodings

2007-11-25 Thread Mattias Gaertner
On Sun, 25 Nov 2007 15:17:54 +0100
Felipe Monteiro de Carvalho [EMAIL PROTECTED] wrote:

 On Nov 25, 2007 1:42 PM, Mattias Gaertner [EMAIL PROTECTED]
 wrote:
  Thanks.
  TStringlist is too slow and the conversion should not be part of
  codetools. The widgetsets have the best access to the system libs,
  so they have the best encoding converters. I can do that part.
 
 Isn't AnsiToUtf8 enougth? This way we don't depend on any external
 libs.

AnsiToUtf8 can be used to convert from system encoding to UTF-8. And
there is no problem to depend on external libs if the widgetset already
depend on them.

 
 A simple loop using ReadLn and WriteLn should be enougth.

I guess not, but that's an implementation details.

 
  The difficult part is to find out the encoding of a text file.
  Your proposal of the //encoding comment has the advantage of
  comments. Because this is a lazarus feature I suggest to use IDE
  directives style comments: {%encoding xxx}.
 
 The best option would be using the standard way which any editor
 should recognize: BOM

BOM is only for UTF.
There are plenty of files in UTF-8 without BOM.
See below.
 
 But we can't use that until the widestring manager is improved.
 
 Using a comment will cause trouble opening the file on other editors
 which are prepared to handle the standard.

That's a normal problem for all text editors and text files without
encoding info.

If lazarus starts to add the BOM for new UTF-8 files, then FPC should
not convert the string constants.

BTW, what about file functions under windows?


Mattias
 

_
 To unsubscribe: mail [EMAIL PROTECTED] with
unsubscribe as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives


[lazarus] Non UTF-8 Encodings

2007-11-24 Thread Mattias Gaertner
Goal:
- IDE should load/save text files in non UTF-8 encodings.
- The IDE is compiled with UTF-8 LCL, so all internal functions expects
this. 
- It is not sufficient to change synedit, because text is often copied
between synedit and other tools. And there many other IDE tools
reading/editing text files. The central text loading/saving routines
is the TCodeBuffer in the codetools. For performance and
consistency almost all functions use these text file caches. There are
a few exceptions.
- When a TCodeBuffer loads a file it must find out the encoding,
convert it to UTF-8 and on saving convert it back.

Problems:

* How to find out the encoding?
- BOM
- lpi file
- heuristic
- system char set

* What about buggy text files? 


Vasily, your lconv.pas uses libc. AFAIK this unit only exists on a
few platforms. For example not on amd64 nor sparc.


Mattias

_
 To unsubscribe: mail [EMAIL PROTECTED] with
unsubscribe as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives


Re: [lazarus] Non UTF-8 Encodings

2007-11-24 Thread Vasily I. Volchenko
My lconv.pas can use libc because this unit was added by someone else (in LCL), 
I have fixed only working swich combination. It must work without it. Anyway, 
lconv.pas is a bad workaround, presently it fit my purposes, but not others.

_
 To unsubscribe: mail [EMAIL PROTECTED] with
unsubscribe as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives