Non-ASCII identifiers (was: Re: [fpc-devel] Forwarded message about FPC status)

2012-12-24 Thread Mark Morgan Lloyd

Martin Schreiber wrote:

On Monday 24 December 2012 10:23:00 Sven Barth wrote:



As I already wrote there are currently no plans to change that FPC supports
only ASCII identifiers.


I don't think we can trust on that. I hoped that FPC will not use cpstrnew 
too. So if somebody implements non ASCII identifiers because he needs a 
second source Delphi compiler it will be merged because the addition does not 
break existing code. I assume utf-8 identifiers would not be very difficult 
to do in compiler.


But what will the rule be as to whether something's a valid identifier? 
Will it have to start with something known to be a letter, or something 
not known to be a digit or reserved character?


--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: Non-ASCII identifiers (was: Re: [fpc-devel] Forwarded message about FPC status)

2012-12-24 Thread Marco van de Voort
In our previous episode, Mark Morgan Lloyd said:
  too. So if somebody implements non ASCII identifiers because he needs a 
  second source Delphi compiler it will be merged because the addition does 
  not 
  break existing code. I assume utf-8 identifiers would not be very difficult 
  to do in compiler.
 
 But what will the rule be as to whether something's a valid identifier? 
 Will it have to start with something known to be a letter, or something 
 not known to be a digit or reserved character?

Afaik the latter. You specify what is not allowed rather than which are
allowed.

But sourcecode edited on multiple platforms might be a problem, (e.g.
ligatures, denormalization and other forms of slightly different
characters), this could lead to making the comparison of identifiers
expensive, which is what you don't want in a compiler.

But I don't know how big that problem would be. Maybe it is negiable.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: Non-ASCII identifiers (was: Re: [fpc-devel] Forwarded message about FPC status)

2012-12-24 Thread Martin

On 24/12/2012 12:17, Marco van de Voort wrote:

In our previous episode, Mark Morgan Lloyd said:

too. So if somebody implements non ASCII identifiers because he needs a
second source Delphi compiler it will be merged because the addition does not
break existing code. I assume utf-8 identifiers would not be very difficult
to do in compiler.

But what will the rule be as to whether something's a valid identifier?
Will it have to start with something known to be a letter, or something
not known to be a digit or reserved character?

Afaik the latter. You specify what is not allowed rather than which are
allowed.



Hm that makes it easy to have an incomplete list, that could later 
become a problem


half-width spaces etc..., control chars (RTL/LTR...), currently unused 
codepoints (that could become anything in future...)


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: Non-ASCII identifiers (was: Re: [fpc-devel] Forwarded message about FPC status)

2012-12-24 Thread Marco van de Voort
In our previous episode, Martin said:
 
 Hm that makes it easy to have an incomplete list, that could later 
 become a problem
 
 half-width spaces etc..., control chars (RTL/LTR...), currently unused 
 codepoints (that could become anything in future...)

Still shorter than what is allowed. And I'm pretty sure Delphi does it that
way, it was said during the Delphi 2009 sales presentation iirc. (when they
demonstrated unicode in source and object inspector)

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: Non-ASCII identifiers (was: Re: [fpc-devel] Forwarded message about FPC status)

2012-12-24 Thread Graeme Geldenhuys
On 24/12/12 11:22, Martin wrote:
 
 half-width spaces etc..., control chars (RTL/LTR...), currently unused 
 codepoints (that could become anything in future...)

As Marco said, the list will be smaller than the allowed list.

Also the Unicode specification defines blocks or categories for code
points, so that could be used too. eg: Take a look at
TCharacter.IsNumeric(..) implementation. It doesn't do actual code point
comparisons, it simply checks the Unicode category of the passed in code
point.


Regards,
  - Graeme -


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: Non-ASCII identifiers (was: Re: [fpc-devel] Forwarded message about FPC status)

2012-12-24 Thread Martin

On 24/12/2012 13:05, Marco van de Voort wrote:

In our previous episode, Martin said:

Hm that makes it easy to have an incomplete list, that could later
become a problem

half-width spaces etc..., control chars (RTL/LTR...), currently unused
codepoints (that could become anything in future...)

Still shorter than what is allowed. And I'm pretty sure Delphi does it that
way, it was said during the Delphi 2009 sales presentation iirc. (when they
demonstrated unicode in source and object inspector)



If you use a utf8 lib, you can check attributes of each codepoint.. 
though that may be slower

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel