Daniël Mantione schrieb:
The issue might be the UCS-2 encoding of your source, perhaps try to
feed the compiler UTF-8, I didn't even know the compiler accepts UCS-2,
it may not work correctly.
The compiler definitively eats no ucs-2 encoded sources.
Michael Schnell schrieb:
A decent system should be able to do the necessary conversions
automatically:
This is a simplified view which ignores the resource wasting of this
apporoach not visible in the academical example below. The conversion
utf-8-utf-16 is a very expensive operation and the
The compiler definitively eats no ucs-2 encoded sources.
I did check several times: My source file looks like this when I open it
with Ultra-Edit and tell to show it in Hex:
FF FE 75 00 6E 0069 00 74 00 20 00 55 00 6E 00 ..u.n.i.t. .U.n.
Now I created a Delphi program and read the file
Op Thu, 23 Oct 2008, schreef Michael Schnell:
The compiler definitively eats no ucs-2 encoded sources.
I did check several times: My source file looks like this when I open it with
Ultra-Edit and tell to show it in Hex:
FF FE 75 00 6E 0069 00 74 00 20 00 55 00 6E 00 ..u.n.i.t. .U.n.
As has been said before: the compiler itself simply does not support
UCS-2. Regardless of any BOM, compiler setting or Lazarus setting, it
will not understand it.
See ,y other post in this thread: Windows XP seems to play some tricks
on us here so that Ultraedit sees the UCS2 coded file
The conversion
utf-8-utf-16 is a very expensive operation and the compiler has to
insert it all over the place and people would cry about the performance
of their programs.
Of course I do agree.
If you want to care about performance you need to know what to do:
Either use WideString all over
Michael Schnell schreef:
The conversion
utf-8-utf-16 is a very expensive operation and the compiler has to
insert it all over the place and people would cry about the performance
of their programs.
Of course I do agree.
If you want to care about performance you need to know what to do:
Michael Schnell schrieb:
The conversion
utf-8-utf-16 is a very expensive operation and the compiler has to
insert it all over the place and people would cry about the performance
of their programs.
Of course I do agree.
If you want to care about performance you need to know what to do:
utf-16 application shouldn't do this
either: it doesn't handle surrogates properly
Right you are. For me WideString is UCS2 and not UTF16, as I regard it
as a sequence of WideChar so that the Unicode user code can be done
using WideChar and WideString. WideChar only has 16 Bits. So this
In our previous episode, Florian Klaempfl said:
But if you use UTF8String you need to be aware that you can't do simple
and totally normal things like s := copy(s, 3); to get the first three
characters of a string. Really finding the first three characters of a
string is an interesting and
Ultraedit might fool you here. Id edits either ansi or usc2. If you
have a utf8 encoded file, it will show the contents in hex as being ucs2
That might be. But it would even virtually insert a BOPM ?!?!?!? Why
should it do this when using the hex editor ?
-Michael
More importantly, most of such routines will be implicitely tied to a
certain language or language group already.
Which kind of UCS2 based function do you think are tied to a
language(group) ?
-Michael
___
fpc-devel maillist -
On 23 Oct 2008, at 13:41, Michael Schnell wrote:
utf-16 application shouldn't do this
either: it doesn't handle surrogates properly
Right you are. For me WideString is UCS2 and not UTF16, as I regard
it as a sequence of WideChar so that the Unicode user code can be
done using WideChar and
Michael Schnell schrieb:
More importantly, most of such routines will be implicitely tied to a
certain language or language group already.
Which kind of UCS2 based function do you think are tied to a
language(group) ?
Bidi stuff? You are aware of the fact that unicode strings can
On Thursday 23 October 2008 13.31:30 Florian Klaempfl wrote:
This is also a simplified view.
- firstly, which real world (!) task really requires to execute an
operation like this, mostly it's something like copy(s,pos(...),...);
- secondly, a properly coded utf-16 application shouldn't do
Michael Schnell wrote:
Ultraedit might fool you here. Id edits either ansi or usc2. If you
have a utf8 encoded file, it will show the contents in hex as being ucs2
That might be. But it would even virtually insert a BOPM ?!?!?!? Why
should it do this when using the hex editor ?
Since it
Bidi stuff? You are aware of the fact that unicode strings can contain
e.g. bidi markers?
Sorry, never heard of bidi :(
-Michael
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel
Michael Schnell schrieb:
Bidi stuff? You are aware of the fact that unicode strings can contain
e.g. bidi markers?
Sorry, never heard of bidi :(
http://www.unicode.org/reports/tr9/
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
If you want widestring, then maybe mseide is a better option for you.
Again I do know this, and I in fact don't have a project that needs
Unicode. But the cause why I started this thread is to help making
Lazarus / FPC even more useful.
-Michael
Since it converts the UTF8 file internally to UCS2 on read before
editing.
Seems really silly to me.
But the file length really indicated that it's utf8 coded and when
looking at the file with WinCommander's hex viewer it's utf-8. So I
suppose that you are right and the nasty trick is
On Thursday 23 October 2008 13.58:04 Michael Schnell wrote:
Bidi stuff? You are aware of the fact that unicode strings can contain
e.g. bidi markers?
Sorry, never heard of bidi :(
Bidirectional text. Much more important than the hypothetical codepoints above
the BMP. MSEgui does not
I doubt that you will never need to support decomposed characters
(such as ä being encoded as basically a¨). It's not that uncommon.
This is the nasty old stuff Unicode should be useful to get rid of
-Michael
___
fpc-devel maillist -
Michael Schnell wrote:
Since it converts the UTF8 file internally to UCS2 on read before
editing.
Seems really silly to me.
No it's not. This way you have internally only to support 2 editors. One
with bytechars and one with wordchars (ignoring surrogates and other stuff)
But the file
http://www.unicode.org/reports/tr9/
Thanks. I see. (In fact I even did do embedded software for a display
that can show Hebrew text. But this was with ANSI code.)
-Michael
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
Hello Michael,
Thursday, October 23, 2008, 1:46:48 PM, you wrote:
More importantly, most of such routines will be implicitely tied to a
certain language or language group already.
MS Which kind of UCS2 based function do you think are tied to a
MS language(group) ?
UpperCase, LowerCase,
Op Thu, 23 Oct 2008, schreef JoshyFun:
Hello Michael,
Thursday, October 23, 2008, 1:46:48 PM, you wrote:
More importantly, most of such routines will be implicitely tied to a
certain language or language group already.
MS Which kind of UCS2 based function do you think are tied to a
MS
Hello Daniël,
Thursday, October 23, 2008, 5:34:59 PM, you wrote:
DM Don't overexagerate, this is true with plain ASCII as well. Non-English
DM software exists already for over 5 decades and nothing has stopped us to
DM write code that performs the functions you name.
I'm not overexagerating,
Op Thu, 23 Oct 2008, schreef JoshyFun:
Hello Daniël,
Thursday, October 23, 2008, 5:34:59 PM, you wrote:
DM Don't overexagerate, this is true with plain ASCII as well. Non-English
DM software exists already for over 5 decades and nothing has stopped us to
DM write code that performs the
On Thu, 23 Oct 2008 08:53:27 +0200 (CEST)
Peter Vreman [EMAIL PROTECTED] wrote:
On Wed, 22 Oct 2008 10:32:36 +0200 (CEST)
Peter Vreman [EMAIL PROTECTED] wrote:
As of version 2.3.1, the compiler by itself indicates all the
various features it supports with FPC_HAS_FEATURE_XXX defines.
On Thu, 23 Oct 2008, Mattias Gaertner wrote:
On Thu, 23 Oct 2008 08:53:27 +0200 (CEST)
Peter Vreman [EMAIL PROTECTED] wrote:
On Wed, 22 Oct 2008 10:32:36 +0200 (CEST)
Peter Vreman [EMAIL PROTECTED] wrote:
As of version 2.3.1, the compiler by itself indicates all the
various
Michael Van Canneyt schreef:
And did you fix the 'TObject not found' with a short-term solution ? :-)
Maybe svn up -r11887 (in fpc/trunk)
Vincent
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
On Thu, 23 Oct 2008, Vincent Snijders wrote:
Michael Van Canneyt schreef:
And did you fix the 'TObject not found' with a short-term solution ? :-)
Maybe svn up -r11887 (in fpc/trunk)
home: svn log -r 11887 .
DM Example: In Dutch uppercase characters generally do not get
tremas: Daniël becomes DANIEL. Should an uppercase routine worry?
No, this is a spelling convention, the correct uppercase of ë is
Ë, we should not confuse spelling with uppercasing.
No. This is not a spelling convention. It is
Michael Van Canneyt schreef:
On Thu, 23 Oct 2008, Vincent Snijders wrote:
Michael Van Canneyt schreef:
And did you fix the 'TObject not found' with a short-term solution ? :-)
Maybe svn up -r11887 (in fpc/trunk)
home: svn log -r 11887 .
Hello listmember,
Thursday, October 23, 2008, 11:58:51 PM, you wrote:
l Yes, it is impretative that we know the language of the word is in, so that
l UpperCase(sólo, langSpanish) -- SÓLO
l UpperCase(solo, langSpanish) -- SOLO
l Otherwise, we may end up altering the meaning of the text.
l
I agree with Daniël on this one. Simplify. ë -- Ë always
If you need something which takes into consideration the language then
build another routine with more parameters.
--
Felipe Monteiro de Carvalho
___
fpc-devel maillist -
On 2008-10-24 02:46, Felipe Monteiro de Carvalho wrote:
I agree with Daniël on this one. Simplify. ë -- Ë always
If you need something which takes into consideration the language then
build another routine with more parameters.
It's not that simple.
How would you uppercase this piece of
37 matches
Mail list logo