On 12 Mar 2010, at 20:15, Philip Olson wrote:


On Mar 12, 2010, at 10:46 AM, Stanislav Malyshev wrote:

Hi!

Yeah.
We tried it, and it simply didn't pan out (performance, bc, lost interest, ..).

I think it is a bit premature to declare the death of Unicode in PHP. Yes, we know there are problems, and yes, it was harder that initially thought, so we may want to take a step back and rethink it. Also we may want to get Unicode out of the way of other PHP development, since it's taking longer than planned. But that doesn't mean we should bury it.

How have other languages progressed down the unicode road? Is there anything we can learn from their progress over these past few years?

From all the languages that I've had dealings with, only Python has attempted anything like the previous PHP 6 attempt. Ruby's move to a certain level of Unicode support in 1.9 is interesting, though I'm not entirely sure that's been out for long enough to draw any real conclusions about uptake of it from.

I think the most important thing learnt from the Python case is that backwards compatibility is paramount, and trying to break backwards compatibility with programmatic conversion to the new language version is hard to gather uptake on, yet alone what happened with the old PHP6 branch, which would've broken large amounts of applications with no way to programmatically convert code to it.

Python 2 had no problem getting uptake where Unicode strings need to be specifically marked (e.g., u"foo" as opposed to "foo"), yet Python 3 (which can mostly be programmatically converted from Python 2) has had comparatively little uptake due to its incompatibility.

So, let me start with what I want to be true of PHP 6: anything that runs under PHP 5.3 and does not throw any errors (with E_ALL | E_DEPRECATED) must behave identically under PHP 6.

That single statement has quite a lot of consequences, but, with regards to Unicode, one thing more than anything else: Unicode strings cannot be the default. I have plenty of code that uses UTF-8 in some strings and arbitrary binary data in others. I want to be able to move to PHP 6 gradually: I shouldn't have to wait for every library I rely upon to be modified for PHP 6 compatibility. I should just be able to move to PHP 6, and look over my own code and change what strings I want to Unicode strings.

To point out what should be obvious to everyone here: one of the biggest strengths of PHP is the large amount of library and applications already written for it. Making a large, backwards incompatible change such as making Unicode strings the default would not only limit adoption to those who have entirely new code, but also alienate most shared-hosting providers who cannot afford to break their clients code because of a backwards incompatible change that'll break everyone's applications.

If there's one thing I've learnt from working on browsers for the past few years it's that backwards compatibility is more valuable than something new and shiny. I have no doubt PHP needs Unicode support, but I don't think that breaking backwards compatibility for it is the right solution. The fact that PHP is deployed as it is, often in shared hosting setups, should very much be a reason to be concerned for backwards compatibility. A browser would get almost no marketshare if it broke a large percentage of existing websites; I believe the same to be true of PHP with the websites it powers.


--
Geoffrey Sneddon
<http://gsnedders.com/>


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to