> > What I am thinking is, if unicode_semantics=on, every single time I need > > to call urlencode (or other binary-only functions) with a variable, I > > need to typecast it. Well, if this is necessary 100% of the times, why > > not do this already inside urlencode, and if the string contains bad > > characters, give the same warning I get on the (binary) typecast on an > > incompatible string? I am just trying to think logically, I don't know > > the amout of work something like this would generate. > > > > If this is done, ate least 1 25000 lines PHP application will work on > > PHP6 with 5 line changes. I think this is great marketing for PHP6 > > migration. > > Yes, I am sure we can do something intelligent with some of these cases. > It's still quite early in the migration and we are likely to revisit > many of these functions to make them more flexible. I don't really see > why urlencode() shouldn't be smarter about handling Unicode strings.
I jst compare urlencode/urldecode with Java that is from its nature using Unicode. http://java.sun.com/javase/6/docs/api/java/net/URLEncoder.html The input parameter is a *String* (which is per definition Unicode in Java). The output is (a ASCII-only) string with the URL-encoded values. The character set for the out put is choosen by an additional parameter (that could be done in PHP, too) or it uses the platform default (in PHP that would be the encoding used to write strings to files or the web output. This default encoding would fully conform to what is currently done when creating web pages. Web Browsers encode the entered values in the encoding the webpage, that contains the form, uses. A php script that generates URLs should act in the same way, so the URLencoded values in an URL should be encoded using te characterst that is used for text output. A special case would be the HTTP/URI standard that states that URLs should encoded always UTF-8 (but ALL browser do not do this for form values). But they do it for encoding path components containing special characters and webservers exspect it in that way when mapping the path component to a local filesystem. So the default encoding when using rawurlencode (which is normally used NOT for forms but more for Pathes, DOIs, URNs,...) should be UTF-8. But I think that would be contraproductive to differentiate between rawurlencode and urlencode. But for the case when a user want to encode a string to a different encoding, he could use an optional second parameter to (raw)urlencode (like in Java). In the case that (raw)urlencode is given a *binary* string the second parameter should be disallowed and a warning or what ever should be raised. A binary string should encode byte-by-byte as before! I think this would make a lot of applications more backwards compatible and code more simplier. ----- Uwe Schindler [EMAIL PROTECTED] - http://www.php.net NSAPI SAPI developer Bremen, Germany -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php