Re: [fpc-devel] Forwarded message about FPC status

2013-01-07 Thread Michael Schnell

On 12/24/2012 05:19 PM, Martin Schreiber wrote:


- Compile at least as fast as Delphi 7


IMHO hard to do for a portable system and not very important regarding 
modern hardware. I only feel the linking stage is a viable goal here, as 
in most cases the by far most of the already compiled units need not to 
be recompiled when doing a make after editing some source code.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Forwarded message about FPC status

2013-01-07 Thread Michael Schnell

On 12/25/2012 02:22 PM, Florian Klaempfl wrote:


What's the advantage in doing so? The code hangs around and does not 
hurt in any way.

+1

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] LLVM

2013-01-07 Thread Michael Schnell

On 12/26/2012 11:43 AM, Martin Schreiber wrote:


Do you have experiences with LLVM? Does it actually create great code?


Lets see what Embarcadero comes up with

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Feature announcement: Extension of TThread's interface

2013-01-07 Thread Michael Schnell

Great !

Thanks a lot.
-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-07 Thread Michael Schnell

On 01/05/2013 12:28 PM, Jonas Maebe wrote:
Using whatever #xx#xx or #xx#xx#xx sequence represents the UTF-8 
encoding of that character.
Sorry, I can't follow. Does #xx not just define a numerical 
representation of an 8 bit entity ?


The interpretation in any code might be done later by any code that 
digests the string.


Am I wrong ?

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-07 Thread Michael Schnell

On 01/05/2013 01:35 PM, Jy V wrote:

I do vote for UTF-8

-1

Regarding that conversions in the RTL (or LCL) are a rather seldom 
runtime-task, GUI performance issues are not really necessary to be 
considered.


Viable issues seem to be Delphi compatibility, backward compatibility, 
usability, runtime-performance with time consuming complex string tasks 
(these seem to vote against UTF8, but for either static UTF 16 or 
(quasi-) dynamical (CE-alike) encoding; and memory usage and 
runtime-performance with time consuming simple string tasks (which vote 
for locale-based ANSI or UTF-8).


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-07 Thread Ewald
Once upon a time, on 01/07/2013 12:39 PM to be precise, Michael Schnell
said:
 On 01/05/2013 12:28 PM, Jonas Maebe wrote:
 Using whatever #xx#xx or #xx#xx#xx sequence represents the UTF-8
 encoding of that character.
 Sorry, I can't follow. Does #xx not just define a numerical
 representation of an 8 bit entity ?

 The interpretation in any code might be done later by any code that
 digests the string.

 Am I wrong ?
I *think* Jonas is trying to say that if you want the character `Ǿ` in a
string you would either type
- 'Ǿ' or
- #$C7#$BE if you want to keep the source free of encoding specific
characters

You as a programmer make up what you do with it afterwards, if you
decide to write it to an UTF-8 terminal, you would get `Ǿ`, and if you
write it to some other terminal you might see a character that matches
$C7, followed by a character that matches $BE in the lookuptable of the
encoding of the terminal. Look at it this way: the byte sequence ($C7,
$BE) has got no meaning to the compiler whatsoever, it is a byte
sequence. That's what matters to the compiler, what is in this sequence
is for you to decide.

Correct me if I'm wrong.

-- 
Ewald

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] LLVM

2013-01-07 Thread Graeme Geldenhuys
On 01/07/13 11:02, Michael Schnell wrote:

 Lets see what Embarcadero comes up with


I wouldn't hold my breath. Based on recent Embarcadero history, the
first version would be absolute crap, second version might be beta
quality, 3rd version might not even exist (removed from product).


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-07 Thread Tomas Hajny
On Mon, January 7, 2013 13:28, Ewald wrote:
 Once upon a time, on 01/07/2013 12:39 PM to be precise, Michael Schnell
 said:
 On 01/05/2013 12:28 PM, Jonas Maebe wrote:
 Using whatever #xx#xx or #xx#xx#xx sequence represents the UTF-8
 encoding of that character.
 Sorry, I can't follow. Does #xx not just define a numerical
 representation of an 8 bit entity ?

 The interpretation in any code might be done later by any code that
 digests the string.

 Am I wrong ?
 I *think* Jonas is trying to say that if you want the character `Ǿ` in a
 string you would either type
 - 'Ǿ' or
 - #$C7#$BE if you want to keep the source free of encoding specific
 characters
 .
 .

...or
- #$01FE and then the whole string becomes a Unicode string which is
either kept that way (if it is assigned to a UnicodeString constant), or
it is converted to some 8-bit encoding at compile time (if it is assigned
to an 8-bit constant/variable like ansistring)

(also just my understanding of what Jonas wrote)

Tomas


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-07 Thread Michael Schnell
So the ambiguity  with _filling_ a string with data in fact arises when 
_not_ using the #nn notation :-) . With #nn the effect (i.e. the 
resulting binary) is obvious.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-07 Thread Michael Schnell

On 01/07/2013 02:01 PM, Tomas Hajny wrote:

(also just my understanding of what Jonas wrote)


I feel you are wrong. The string does not know about the code it's 
content is to be interpreted in (other than with Delphi XE).


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-07 Thread Ewald
Once upon a time, on 01/07/2013 02:17 PM to be precise, Michael Schnell
said:
 So the ambiguity  with _filling_ a string with data in fact arises
 when _not_ using the #nn notation :-) . With #nn the effect (i.e. the
 resulting binary) is obvious.
Well, if there is literally the sequence $C7, $BE in your source code
(that is, open up a hex editor and actually see the values there, as one
byte each) that would also do the same, as the compiler will default to
one byte strings I think. The only issue with this is that you also need
to set your code editor to the encoding you want 'cause otherwise it
will screw up the display and possible binary value of the character.

So, yes I would say the #nn notation is probably the safest to use, also
handy if your character contains (or is) something that `cannot be
there`, like a newline: #10 (or #13#10 under windows)

Also, if you use a literal utf-16 char in the code (so no #, but the
actual character) I think the {$codepage utf16} directive might come in
handy, as otherwise the compiler will interpret this series of bytes as
sperate single bytes characters. This is however not an issue with the
# notation, as there is no ambiguity with this interpretation.

-- 
Ewald

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-07 Thread Frank Church
On 5 January 2013 13:39, Mattias Gaertner nc-gaert...@netcologne.de wrote:

 On Sat, 5 Jan 2013 13:06:42 +
 Frank Church vfcli...@gmail.com wrote:

 [...]
  It is obvious that Unicode is not a simple topic and among FPC/Lazarus
  developers/contributors,I suspect that few if any at all, have a detailed
  grasp of how it all hangs together in the current state of
 implementation.
  It brings to mind the parable of the 12 blind men and the elephant.

 The FPC and Lazarus UTF details are not that difficult. The
 complexity comes from adding Delphi *, third party libraries and
 old FPC, Lazarus versions.


  I think a diagram or graph of Unicode rules and their current state of
  implementation in FPC/Lazarus would go a long way to helping both
  developers and end users in this area. It is a topic which comes up
  regularly and it doesn't show signs of ever going to be properly
 resolved.

 For Lazarus:
 - works with fpc 2.6.x and 2.7.1
 - LCL and most code expect ansistrings to hold UTF-8.
 - pascal sources, lfm, po files are stored in UTF-8 without BOM.
   Special care has to be taken, when using widestrings/unicodestring.
 - there are UTF-8 functions and classes (most in package lazutils).
 - the IDE supports many encodings
 - all this is documented via wiki and fpdoc
 - no support for UTF-16 has been started


 [...]

 Mattias
 ___
 fpc-devel maillist  -  fpc-devel@lists.freepascal.org
 http://lists.freepascal.org/mailman/listinfo/fpc-devel



Glad to hear this.

-- 
Frank Church

===
http://devblog.brahmancreations.com
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-07 Thread Tomas Hajny
On Mon, January 7, 2013 14:19, Michael Schnell wrote:
 On 01/07/2013 02:01 PM, Tomas Hajny wrote:
 (also just my understanding of what Jonas wrote)

 I feel you are wrong. The string does not know about the code it's
 content is to be interpreted in (other than with Delphi XE).

Sorry, your way of quoting makes it difficult for others to react.

I freely admit that I may be wrong, but I don't understand what you meant
with your comment and thus I don't understand in what way you I am wrong
in your view. The compiler obviously knows how the constant is used within
the source code and thus it may proceed accordingly (i.e. either convert
it to some 8-bit encoding at compile time if UTF-16 code constant appears
in the source, or keep it in UTF-16 if assigned to a UnicodeString
constant).

Tomas


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-07 Thread Ewald
Once upon a time, on 01/07/2013 05:05 PM to be precise, Tomas Hajny said:
 On Mon, January 7, 2013 14:19, Michael Schnell wrote:
 On 01/07/2013 02:01 PM, Tomas Hajny wrote:
 (also just my understanding of what Jonas wrote)
 I feel you are wrong. The string does not know about the code it's
 content is to be interpreted in (other than with Delphi XE).
 Sorry, your way of quoting makes it difficult for others to react.

 I freely admit that I may be wrong, but I don't understand what you meant
 with your comment and thus I don't understand in what way you I am wrong
 in your view. The compiler obviously knows how the constant is used within
 the source code and thus it may proceed accordingly (i.e. either convert
 it to some 8-bit encoding at compile time if UTF-16 code constant appears
 in the source, or keep it in UTF-16 if assigned to a UnicodeString
 constant).
Yep, the compiler does know how the constant is used and how it is
defined (how else could it generate working code?), but I don't see how
it could do something with it if it is assigned to another type of
string (by type I mean `one-byte versus two-byte`). The compiler can't
know for sure what you mean, it can do at least these things:
  - Copy data without translating, so a one char two-byte string becomes
a two char one-byte string; a three char one-byte string would become a
three char two byte string; and then there is a pardox: should a
three-char two-byte string become a six-char one-byte string? == this
is probably not how it is done
  - Translate the meanings of the characters of the string, but here the
compiler needs to know in what encoding they are and in what encoding
the string is wanted. (which it doesn't I believe; the $codepage
directive is only used for the encoding of the characters in the unit
intself) == I think this also isn't a a possibility
  - Copy the data byte per byte, but then a one-byte string containing
an uneven amount of chars needs padding + there are issues with
endianness here == Not really an option no?
  - Truncate every value of a two-byte string to convert it two a one
byte string; the other way around would put each character of the
one-byte string as one in the two-byte string == Solves the first
paradox, but introduces loss of data

== All the above options (except the translation, that is) ignore the
escape charachter(s) of the string, so you wont get the data you want.

IMO I don't think it (typecasting a one-byte string to a two-byte
string) can be done without human intervention. Look at it this way:
typecasting a thread handle to an integer makes no sense either:
  - They are both related (a thread handle is definitely a number, even
if it is a pointer)
  - But putting one in the other makes no sense at all: what does
`comparing whether a thread id is less than zero` mean? on the other
hand `comparing whether an integer is less than zero` has a distinct
meaning.
  - The sizes may be different (say an integer of 16 bit long and a
thread handle of 64 bit long), how do you put one in the other? Sum the
bytes together? Multiply them? Take the 16 bit CRC of the handle?

This is IMO the same with a one-byte char and a two byte char:
 - They both represent letters/words/...
 - But they are not the same and cannot be typecasted without extra
knowlegde.

This last point is also valid for my example above: you could put all
thread ids you know of in a lookup-table and put the index in that
lookup-table in the 16-bit integer. Fixed. Same goes for our strings: if
you know one is UTF-8 and you want to convert it to UTF-16 it can be
done without error, but without this extra knowledge it can't give you
decisive results.

Just a few points I think bear some potential to contemplate over a cup
of $c0ffee ;-)

-- 
Ewald

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-07 Thread Mark Morgan Lloyd

Tomas Hajny wrote:

On Mon, January 7, 2013 13:28, Ewald wrote:

Once upon a time, on 01/07/2013 12:39 PM to be precise, Michael Schnell
said:

On 01/05/2013 12:28 PM, Jonas Maebe wrote:

Using whatever #xx#xx or #xx#xx#xx sequence represents the UTF-8
encoding of that character.

Sorry, I can't follow. Does #xx not just define a numerical
representation of an 8 bit entity ?

The interpretation in any code might be done later by any code that
digests the string.

Am I wrong ?

I *think* Jonas is trying to say that if you want the character `Ǿ` in a
string you would either type
- 'Ǿ' or
- #$C7#$BE if you want to keep the source free of encoding specific
characters

 .
 .

...or
- #$01FE and then the whole string becomes a Unicode string which is
either kept that way (if it is assigned to a UnicodeString constant), or
it is converted to some 8-bit encoding at compile time (if it is assigned
to an 8-bit constant/variable like ansistring)

(also just my understanding of what Jonas wrote)


That's how I read it as well. In which case, is #A3 16-bit Unicode 
(representing the UK £ Sterling) or malformed UTF-8 (should be #c2#a3)?


--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


[fpc-devel] Re: Strings - the fun part [very off-topic]

2013-01-07 Thread dev . dliw
Hi,
this is very off-topic, no serious responses expected...
Everytime I read anything about strings here, this comes into my mind:

http://www.rigsofrods.com/entries/155-the-chaos-of-character-encodings

Strings seem to be a general problem for numeric machines :D

d.l.i.w
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-07 Thread Aleksa Todorovic
On Mon, Jan 7, 2013 at 6:05 PM, Mark Morgan Lloyd 
markmll.fpc-de...@telemetry.co.uk wrote:

 Tomas Hajny wrote:

 On Mon, January 7, 2013 13:28, Ewald wrote:

 Once upon a time, on 01/07/2013 12:39 PM to be precise, Michael Schnell
 said:

 On 01/05/2013 12:28 PM, Jonas Maebe wrote:

 Using whatever #xx#xx or #xx#xx#xx sequence represents the UTF-8
 encoding of that character.

 Sorry, I can't follow. Does #xx not just define a numerical
 representation of an 8 bit entity ?

 The interpretation in any code might be done later by any code that
 digests the string.

 Am I wrong ?

 I *think* Jonas is trying to say that if you want the character `Ǿ` in a
 string you would either type
 - 'Ǿ' or
 - #$C7#$BE if you want to keep the source free of encoding specific
 characters

  .
  .

 ...or
 - #$01FE and then the whole string becomes a Unicode string which is
 either kept that way (if it is assigned to a UnicodeString constant), or
 it is converted to some 8-bit encoding at compile time (if it is assigned
 to an 8-bit constant/variable like ansistring)

 (also just my understanding of what Jonas wrote)


 That's how I read it as well. In which case, is #A3 16-bit Unicode
 (representing the UK £ Sterling) or malformed UTF-8 (should be #c2#a3)?


The way I understand it is that #A3 will be effected by $codepage directive
of source file. So, if programmer correctly sets $codepage to match
encoding used in editor (be it utf8 or some other encoding), compiler will
also 'understand' that string correctly.

If programmer never uses UnicodeString, and always uses codepage which was
used to write source code, everything will work fine - #A3 will stay
whatever it is in specific encoding.

On the other hand, if there comes situation in which string containing #A3
needs to be converted to UnicodeString, compiler will either: a) convert it
correctly to UnicodeString if encoding used is utf8, or b) call
system-specific function to convert string to array of WideChar-s (in which
case, correctness of the program depends on support for specific encoding
on tharget system).
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-07 Thread Hans-Peter Diettrich

Martin Schreiber schrieb:

but I fear we can not use that information for development with Free Pascal 
because:


The string is represented internally as a Unicode string encoded as UTF-16. 
Characters in the Basic Multilingual Plane (BMP) take 2 bytes, and characters 
not in the BMP require 4 bytes.


and

A control string is a sequence of one or more control characters, each of 
which consists of the # symbol followed by an unsigned integer constant from 
0 to 65,535 (decimal) or from $0 to $ (hexadecimal) in UTF-16 encoding, 
and denotes the character corresponding to a specified code value. Each 
integer is represented internally by 2 bytes in the string. This is useful 
for representing control characters and multibyte characters.


which seems to be different from Free Pascal.


Correction:

You're right, Delphi treats control characters as UTF-16 codes, where 
FPC treats them as byte values (if less than 256).


I noticed the possible problem already, that the FPC interpretation of 
control characters is context sensitive. This leads to write-only code, 
because a change of the $codepage would require to change all control 
characters in that unit accordingly. This in addition to the removal or 
addition of control characters  255, which also lead to a different 
interpretation of the remaining control characters *and* to a different 
internal representation.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel