Hi,
Please check this discussion:
http://community.freepascal.org:1/bboards/message?message_id=172880forum_id=24092
Short summary:
* Many places in the rtl use single character strings, i.e. ansistrings.
* To make them Unicode proof they need to be changed into wide strings.
* But this
Please check this discussion:
http://community.freepascal.org:1/bboards/message?message_id=172880forum_id=24092
Short summary:
* Many places in the rtl use single character strings, i.e. ansistrings.
* To make them Unicode proof they need to be changed into wide strings.
* But this
Op Wed, 16 Nov 2005, schreef Marco van de Voort:
Please check this discussion:
http://community.freepascal.org:1/bboards/message?message_id=172880forum_id=24092
Short summary:
* Many places in the rtl use single character strings, i.e. ansistrings.
* To make them Unicode
Daniël Mantione wrote:
To be short, Juras B. wants to add a Unicode Win32 target, so in the
standard RTL things like Tlist etc. use ansistrings, while in the Unicode
RTL they use widestrings.
Why not use ansistrings with UTF-8 ?
IMHO this is indeed a good solution, but one with consequences.
allowing to reuse the OS independant code.
However this doesn't work fully this way, since unicode internally also hits
OS-independant code hard. IOW the separate target only solves the windows
unit defaults.
Indeed. What is proposed is that the system independent RTL uses an
Daniël Mantione napsal(a):
Hi,
Please check this discussion:
http://community.freepascal.org:1/bboards/message?message_id=172880forum_id=24092
Short summary:
* Many places in the rtl use single character strings, i.e. ansistrings.
* To make them Unicode proof they need to be changed
Op Wed, 16 Nov 2005, schreef Tomas Hajny:
Big overhead (double maintenance efforts for all targets supporting this
schisma). :-( I'd say it's better to successively identify the weak points
and address these case by case.
I know, I'm all for abolishing Chinese (and perhaps Korean), the only
Op Wed, 16 Nov 2005, schreef Micha Nelissen:
Daniël Mantione wrote:
To be short, Juras B. wants to add a Unicode Win32 target, so in the
standard RTL things like Tlist etc. use ansistrings, while in the Unicode
RTL they use widestrings.
Why not use ansistrings with UTF-8 ?
Because
Op Wed, 16 Nov 2005, schreef Vincent Snijders:
Daniël Mantione wrote:
What should be done on Linux/FreeBSD/MacOS is still unknown to me, it is
a wild west, but likely something similar, internally a widestring rtl is
used that converts to the right encoding when communicating
Op Wed, 16 Nov 2005, schreef Micha Nelissen:
Dani?l Mantione wrote:
To be short, Juras B. wants to add a Unicode Win32 target, so in the
standard RTL things like Tlist etc. use ansistrings, while in the Unicode
RTL they use widestrings.
Why not use ansistrings with UTF-8 ?
Daniël Mantione wrote:
Op Wed, 16 Nov 2005, schreef Tomas Hajny:
Big overhead (double maintenance efforts for all targets supporting this
schisma). :-( I'd say it's better to successively identify the weak points
and address these case by case.
Yes.
I know, I'm all for abolishing
On 11/16/05, Daniël Mantione [EMAIL PROTECTED] wrote:
I know, I'm all for abolishing Chinese (and perhaps Korean), the only
language(s) that absolutely cannot be written in with an 8 byte code.
why not remove all languages that do not fit ASCII next???
Now there are several solutions:
1.
Daniël Mantione wrote:
Op Wed, 16 Nov 2005, schreef Micha Nelissen:
Daniël Mantione wrote:
To be short, Juras B. wants to add a Unicode Win32 target, so in the
standard RTL things like Tlist etc. use ansistrings, while in the Unicode
RTL they use widestrings.
Why not use ansistrings with
Felipe Monteiro de Carvalho wrote:
On 11/16/05, Daniël Mantione [EMAIL PROTECTED] wrote:
I know, I'm all for abolishing Chinese (and perhaps Korean), the only
language(s) that absolutely cannot be written in with an 8 byte code.
why not remove all languages that do not fit ASCII
Daniël Mantione wrote:
Op Wed, 16 Nov 2005, schreef Vincent Snijders:
Daniël Mantione wrote:
What should be done on Linux/FreeBSD/MacOS is still unknown to me, it is
a wild west, but likely something similar, internally a widestring rtl is
used that converts to the right encoding when
Op Wed, 16 Nov 2005, schreef Florian Klaempfl:
Daniël Mantione wrote:
Op Wed, 16 Nov 2005, schreef Micha Nelissen:
Daniël Mantione wrote:
To be short, Juras B. wants to add a Unicode Win32 target, so in the
standard RTL things like Tlist etc. use ansistrings, while in the
Dani?l Mantione wrote:
y
Because then you will have to modify routines like pos, insert, delete.
Since that is not possible, you would get a pos_utf8, insert_utf8, etc.
No, why? When working with utf-8 strings, you don't use character positions.
That's pretty much only when using
Dani?l Mantione wrote:
pos('?','Dani?l');
... has a different implementation for utf-8 and 8-bit code pages.
Why? With utf-8 a string is searched, with 8-bit cp one char. No other
char/sequence of char other than ? can generate the byte sequence
representing ?
const s : 'Dani?l';
Daniël Mantione wrote:
Op Wed, 16 Nov 2005, schreef Tomas Hajny:
Big overhead (double maintenance efforts for all targets supporting this
schisma). :-( I'd say it's better to successively identify the weak
points
and address these case by case.
I know, I'm all for abolishing Chinese (and
Florian Klaempfl wrote:
Felipe Monteiro de Carvalho wrote:
On 11/16/05, Daniël Mantione [EMAIL PROTECTED] wrote:
I know, I'm all for abolishing Chinese (and perhaps Korean), the only
language(s) that absolutely cannot be written in with an 8 byte
code.
why not remove all languages that
Marco van de Voort napsal(a):
Dani?l Mantione wrote:
pos('?','Dani?l');
... has a different implementation for utf-8 and 8-bit code pages.
Why? With utf-8 a string is searched, with 8-bit cp one char. No other
char/sequence of char other than ? can generate the byte sequence
... has a different implementation for utf-8 and 8-bit code pages.
Why? With utf-8 a string is searched, with 8-bit cp one char. No other
char/sequence of char other than ? can generate the byte sequence
representing ?
const s : 'Dani?l';
var accent : utf8char;
On Wednesday 16 November 2005 13:57, Florian Klaempfl wrote:
Daniël Mantione wrote:
Op Wed, 16 Nov 2005, schreef Vincent Snijders:
Daniël Mantione wrote:
What should be done on Linux/FreeBSD/MacOS is still unknown to me, it is
a wild west, but likely something similar, internally a
Op Wed, 16 Nov 2005, schreef Tomas Hajny:
You're right that strings are used everywhere, but I don't think that this
really means that you need to add special support for widestrings
everywhere. In many places you can pass a DBCS/MBCS string to it today
(e.g. encoded using UTF-8) and it
Op Wed, 16 Nov 2005, schreef Micha Nelissen:
On Wed, 16 Nov 2005 13:38:43 +0100 (CET)
Daniël Mantione [EMAIL PROTECTED] wrote:
Op Wed, 16 Nov 2005, schreef Micha Nelissen:
Why not use ansistrings with UTF-8 ?
Because then you will have to modify routines like pos, insert,
pos('ë','Daniël');
... has a different implementation for utf-8 and 8-bit code pages.
one little desgin feature of utf-8 is that is was carefully designed to be
friendly to byte-orientated code. No special precautions are needed for
substring matching in utf-8!
Op Wed, 16 Nov 2005, schreef peter green:
pos('ë','Daniël');
... has a different implementation for utf-8 and 8-bit code pages.
one little desgin feature of utf-8 is that is was carefully designed to be
friendly to byte-orientated code. No special precautions are needed for
substring
Daniël Mantione wrote:
Op Wed, 16 Nov 2005, schreef peter green:
pos('ë','Daniël');
... has a different implementation for utf-8 and 8-bit code pages.
one little desgin feature of utf-8 is that is was carefully designed to be
friendly to byte-orientated code. No special precautions are
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Daniël
Mantione
Sent: 16 November 2005 21:58
To: FPC developers' list
Subject: RE: [fpc-devel] Unicode RTL
Op Wed, 16 Nov 2005, schreef peter green:
pos('ë','Daniël');
... has a different
On Wed, 16 Nov 2005 17:25:29 +0100 (CET)
Daniël Mantione [EMAIL PROTECTED] wrote:
Op Wed, 16 Nov 2005, schreef Tomas Hajny:
You're right that strings are used everywhere, but I don't think that
this really means that you need to add special support for widestrings
everywhere. In many
Op Wed, 16 Nov 2005, schreef Florian Klaempfl:
Daniël Mantione wrote:
Op Wed, 16 Nov 2005, schreef peter green:
pos('ë','Daniël');
... has a different implementation for utf-8 and 8-bit code pages.
one little desgin feature of utf-8 is that is was carefully designed to be
*sigh* Yes, what he says is correct. Now to do something with
strings. I.e. reverse them, or any other operation that needs to split
the string into pieces.
reversing a string properly requires a very deep understanding of unicode
and huge lookup tables (reversing the code point order will
32 matches
Mail list logo