On 17/09/17 11:53, Rowan Collins wrote:
> On 17 September 2017 09:54:54 BST, Lester Caine <les...@lsces.co.uk> wrote:
>> Just what character set is PHP7
>> designed
>> to work with.
> 
> Focusing on the answerable part of this, PHP actually allows a very wide 
> variety of characters in identifiers (names of variables, classes, functions, 
> etc).
> 
> I checked the PHP lang-spec repo expecting to find a set of Unicode classes, 
> but it currently mentions "U+0080-U+00FF": 
> https://github.com/php/php-langspec/blob/master/spec/09-lexical-structure.md#names
>  That seems wrong to me, unless I'm looking at the wrong definition - the 
> first part of that range is control characters, and you can have variables 
> called things like $🐘 (with an emoji as the entire name).
> 
> That would definitely be the place to document the allowed characters, 
> though, and a rigorous definition of "case insensitive" could also be added. 
> I was wrong, by the way, to say that using "to case fold" rather than "to 
> lower case" would solve the Turkish I problem - the key for that is to define 
> a single locale whose case folding you are using, independent of runtime 
> locale settings.

I think this is actually the problem. Unicode is simply NOT a general
solution! Normalizing is another aspect, and that can result in
differences between strings if one also 'case folds'. On top of which
one has to add the collation one is using to provide sort order which is
another can of worms? Sorting array keys in order depends on the
character set used ... which is perhaps why there seems to be a drive to
replace associative arrays with simple numeric ones?

"U+0020-U+007F" gives the Basic Latin set of characters (ASCII)
"U+0080-U+00FF" add the "Latin-1 Supplement"
The problem is that the second 128 characters is avoiding overlaying the
"U+0000-U+001F" control character block, while single byte character
sets WOULD be more productive if they followed the extra character
convention instead. One of the irritating compromises made by Unicode?

It would perhaps also be nice if the file naming convention used 'nbsp'
for spaces rather than 'sp' and eliminate the need for quotes around
file and directory names, but adding quotes is used by SQL to indicate
'case-sensitive' strings, yet another convention to be given a nod to?
If you get an associative key from a quoted field name it is NOT
case-insensitive and while a second field with the same combination of
characters would be 'silly' it is something that can happen for many
reasons ... and explode() falls over in some instances as a result.

-- 
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to