[fpc-devel] Unicode RTL

2005-11-16 Thread Daniël Mantione
Hi, Please check this discussion: http://community.freepascal.org:1/bboards/message?message_id=172880forum_id=24092 Short summary: * Many places in the rtl use single character strings, i.e. ansistrings. * To make them Unicode proof they need to be changed into wide strings. * But this

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Marco van de Voort
Please check this discussion: http://community.freepascal.org:1/bboards/message?message_id=172880forum_id=24092 Short summary: * Many places in the rtl use single character strings, i.e. ansistrings. * To make them Unicode proof they need to be changed into wide strings. * But this

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Daniël Mantione
Op Wed, 16 Nov 2005, schreef Marco van de Voort: Please check this discussion: http://community.freepascal.org:1/bboards/message?message_id=172880forum_id=24092 Short summary: * Many places in the rtl use single character strings, i.e. ansistrings. * To make them Unicode

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Micha Nelissen
Daniël Mantione wrote: To be short, Juras B. wants to add a Unicode Win32 target, so in the standard RTL things like Tlist etc. use ansistrings, while in the Unicode RTL they use widestrings. Why not use ansistrings with UTF-8 ? IMHO this is indeed a good solution, but one with consequences.

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Marco van de Voort
allowing to reuse the OS independant code. However this doesn't work fully this way, since unicode internally also hits OS-independant code hard. IOW the separate target only solves the windows unit defaults. Indeed. What is proposed is that the system independent RTL uses an

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Tomas Hajny
Daniël Mantione napsal(a): Hi, Please check this discussion: http://community.freepascal.org:1/bboards/message?message_id=172880forum_id=24092 Short summary: * Many places in the rtl use single character strings, i.e. ansistrings. * To make them Unicode proof they need to be changed

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Daniël Mantione
Op Wed, 16 Nov 2005, schreef Tomas Hajny: Big overhead (double maintenance efforts for all targets supporting this schisma). :-( I'd say it's better to successively identify the weak points and address these case by case. I know, I'm all for abolishing Chinese (and perhaps Korean), the only

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Daniël Mantione
Op Wed, 16 Nov 2005, schreef Micha Nelissen: Daniël Mantione wrote: To be short, Juras B. wants to add a Unicode Win32 target, so in the standard RTL things like Tlist etc. use ansistrings, while in the Unicode RTL they use widestrings. Why not use ansistrings with UTF-8 ? Because

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Daniël Mantione
Op Wed, 16 Nov 2005, schreef Vincent Snijders: Daniël Mantione wrote: What should be done on Linux/FreeBSD/MacOS is still unknown to me, it is a wild west, but likely something similar, internally a widestring rtl is used that converts to the right encoding when communicating

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Marco van de Voort
Op Wed, 16 Nov 2005, schreef Micha Nelissen: Dani?l Mantione wrote: To be short, Juras B. wants to add a Unicode Win32 target, so in the standard RTL things like Tlist etc. use ansistrings, while in the Unicode RTL they use widestrings. Why not use ansistrings with UTF-8 ?

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Florian Klaempfl
Daniël Mantione wrote: Op Wed, 16 Nov 2005, schreef Tomas Hajny: Big overhead (double maintenance efforts for all targets supporting this schisma). :-( I'd say it's better to successively identify the weak points and address these case by case. Yes. I know, I'm all for abolishing

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Felipe Monteiro de Carvalho
On 11/16/05, Daniël Mantione [EMAIL PROTECTED] wrote: I know, I'm all for abolishing Chinese (and perhaps Korean), the only language(s) that absolutely cannot be written in with an 8 byte code. why not remove all languages that do not fit ASCII next??? Now there are several solutions: 1.

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Florian Klaempfl
Daniël Mantione wrote: Op Wed, 16 Nov 2005, schreef Micha Nelissen: Daniël Mantione wrote: To be short, Juras B. wants to add a Unicode Win32 target, so in the standard RTL things like Tlist etc. use ansistrings, while in the Unicode RTL they use widestrings. Why not use ansistrings with

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Florian Klaempfl
Felipe Monteiro de Carvalho wrote: On 11/16/05, Daniël Mantione [EMAIL PROTECTED] wrote: I know, I'm all for abolishing Chinese (and perhaps Korean), the only language(s) that absolutely cannot be written in with an 8 byte code. why not remove all languages that do not fit ASCII

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Florian Klaempfl
Daniël Mantione wrote: Op Wed, 16 Nov 2005, schreef Vincent Snijders: Daniël Mantione wrote: What should be done on Linux/FreeBSD/MacOS is still unknown to me, it is a wild west, but likely something similar, internally a widestring rtl is used that converts to the right encoding when

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Daniël Mantione
Op Wed, 16 Nov 2005, schreef Florian Klaempfl: Daniël Mantione wrote: Op Wed, 16 Nov 2005, schreef Micha Nelissen: Daniël Mantione wrote: To be short, Juras B. wants to add a Unicode Win32 target, so in the standard RTL things like Tlist etc. use ansistrings, while in the

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Marco van de Voort
Dani?l Mantione wrote: y Because then you will have to modify routines like pos, insert, delete. Since that is not possible, you would get a pos_utf8, insert_utf8, etc. No, why? When working with utf-8 strings, you don't use character positions. That's pretty much only when using

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Marco van de Voort
Dani?l Mantione wrote: pos('?','Dani?l'); ... has a different implementation for utf-8 and 8-bit code pages. Why? With utf-8 a string is searched, with 8-bit cp one char. No other char/sequence of char other than ? can generate the byte sequence representing ? const s : 'Dani?l';

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Tomas Hajny
Daniël Mantione wrote: Op Wed, 16 Nov 2005, schreef Tomas Hajny: Big overhead (double maintenance efforts for all targets supporting this schisma). :-( I'd say it's better to successively identify the weak points and address these case by case. I know, I'm all for abolishing Chinese (and

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Tomas Hajny
Florian Klaempfl wrote: Felipe Monteiro de Carvalho wrote: On 11/16/05, Daniël Mantione [EMAIL PROTECTED] wrote: I know, I'm all for abolishing Chinese (and perhaps Korean), the only language(s) that absolutely cannot be written in with an 8 byte code. why not remove all languages that

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Tomas Hajny
Marco van de Voort napsal(a): Dani?l Mantione wrote: pos('?','Dani?l'); ... has a different implementation for utf-8 and 8-bit code pages. Why? With utf-8 a string is searched, with 8-bit cp one char. No other char/sequence of char other than ? can generate the byte sequence

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Marco van de Voort
... has a different implementation for utf-8 and 8-bit code pages. Why? With utf-8 a string is searched, with 8-bit cp one char. No other char/sequence of char other than ? can generate the byte sequence representing ? const s : 'Dani?l'; var accent : utf8char;

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Christian Iversen
On Wednesday 16 November 2005 13:57, Florian Klaempfl wrote: Daniël Mantione wrote: Op Wed, 16 Nov 2005, schreef Vincent Snijders: Daniël Mantione wrote: What should be done on Linux/FreeBSD/MacOS is still unknown to me, it is a wild west, but likely something similar, internally a

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Daniël Mantione
Op Wed, 16 Nov 2005, schreef Tomas Hajny: You're right that strings are used everywhere, but I don't think that this really means that you need to add special support for widestrings everywhere. In many places you can pass a DBCS/MBCS string to it today (e.g. encoded using UTF-8) and it

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Daniël Mantione
Op Wed, 16 Nov 2005, schreef Micha Nelissen: On Wed, 16 Nov 2005 13:38:43 +0100 (CET) Daniël Mantione [EMAIL PROTECTED] wrote: Op Wed, 16 Nov 2005, schreef Micha Nelissen: Why not use ansistrings with UTF-8 ? Because then you will have to modify routines like pos, insert,

RE: [fpc-devel] Unicode RTL

2005-11-16 Thread peter green
pos('ë','Daniël'); ... has a different implementation for utf-8 and 8-bit code pages. one little desgin feature of utf-8 is that is was carefully designed to be friendly to byte-orientated code. No special precautions are needed for substring matching in utf-8!

RE: [fpc-devel] Unicode RTL

2005-11-16 Thread Daniël Mantione
Op Wed, 16 Nov 2005, schreef peter green: pos('ë','Daniël'); ... has a different implementation for utf-8 and 8-bit code pages. one little desgin feature of utf-8 is that is was carefully designed to be friendly to byte-orientated code. No special precautions are needed for substring

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Florian Klaempfl
Daniël Mantione wrote: Op Wed, 16 Nov 2005, schreef peter green: pos('ë','Daniël'); ... has a different implementation for utf-8 and 8-bit code pages. one little desgin feature of utf-8 is that is was carefully designed to be friendly to byte-orientated code. No special precautions are

RE: [fpc-devel] Unicode RTL

2005-11-16 Thread peter green
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Daniël Mantione Sent: 16 November 2005 21:58 To: FPC developers' list Subject: RE: [fpc-devel] Unicode RTL Op Wed, 16 Nov 2005, schreef peter green: pos('ë','Daniël'); ... has a different

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Mattias Gaertner
On Wed, 16 Nov 2005 17:25:29 +0100 (CET) Daniël Mantione [EMAIL PROTECTED] wrote: Op Wed, 16 Nov 2005, schreef Tomas Hajny: You're right that strings are used everywhere, but I don't think that this really means that you need to add special support for widestrings everywhere. In many

Re: [fpc-devel] Unicode RTL

2005-11-16 Thread Daniël Mantione
Op Wed, 16 Nov 2005, schreef Florian Klaempfl: Daniël Mantione wrote: Op Wed, 16 Nov 2005, schreef peter green: pos('ë','Daniël'); ... has a different implementation for utf-8 and 8-bit code pages. one little desgin feature of utf-8 is that is was carefully designed to be

RE: [fpc-devel] Unicode RTL

2005-11-16 Thread peter green
*sigh* Yes, what he says is correct. Now to do something with strings. I.e. reverse them, or any other operation that needs to split the string into pieces. reversing a string properly requires a very deep understanding of unicode and huge lookup tables (reversing the code point order will