[PHP-DEV] default charset confusion

2012-03-12 Thread Rasmus Lerdorf
I caused this situation myself by not explicitly differentiating between
the default charset for the internal htmlspecialchars() and
htmlentities() functions and the output charset directive ini directive
default_charset.

The idea behind the default_charset ini directive was to act as the
charset that gets specified in the HTTP Content-type header if you do
not explicitly send your own Content-type header with the header()
function. This has been muddied a bit by the fact that
htmlspecialchars/htmlentities can take it into account when it is trying
to choose which encoding to use when handling data passed to it. This
isn't done by default since it actually makes little sense. It is only
done if you pass an empty string as the encoding argument. If you don't
pass anything at all the default is UTF-8 in 5.4. In 5.3 this was
ISO-8859-1.

And here is where the confusion comes in. We, myself included, have told
people that they can get the 5.3 behaviour back by setting the
default_charset ini directive to iso-8859-1. But, this is only true if
they are forcing htmlspecialchars/htmlentities to check that setting
with an empty string as the encoding arg. Most apps just do
htmlspecialchars($str) and nothing else. Plus, it is really not a good
idea to tie the internal encoding of data being passed to these
functions to the output charset. You should be able to change the output
charset without worrying about your runtime encoding at that level.

What this effectively means is that we are asking people to go through
their code and add an explicit charset to all htmlspecialchars() and
htmlentities() calls. I think this will be a hurdle for 5.4 adoption.

What we really need is what we added in PHP 6. A runtime encoding ini
setting that is distinct from the output charset which we can use here.
That would allow people to fix all their legacy code to a specific
runtime encoding with a single ini setting instead of changing thousands
of lines of code. I propose that we add such a directive to 5.4.1 to
ease migration.

See https://bugs.php.net/61354 for the first signs of grumbling about
this one. As more people migrate I have a feeling this will end up being
the most difficult part of the migration.

-Rasmus

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Laruence
On Mon, Mar 12, 2012 at 2:49 PM, Rasmus Lerdorf ras...@lerdorf.com wrote:
 I caused this situation myself by not explicitly differentiating between
 the default charset for the internal htmlspecialchars() and
 htmlentities() functions and the output charset directive ini directive
 default_charset.

 The idea behind the default_charset ini directive was to act as the
 charset that gets specified in the HTTP Content-type header if you do
 not explicitly send your own Content-type header with the header()
 function. This has been muddied a bit by the fact that
 htmlspecialchars/htmlentities can take it into account when it is trying
 to choose which encoding to use when handling data passed to it. This
 isn't done by default since it actually makes little sense. It is only
 done if you pass an empty string as the encoding argument. If you don't
 pass anything at all the default is UTF-8 in 5.4. In 5.3 this was
 ISO-8859-1.

 And here is where the confusion comes in. We, myself included, have told
 people that they can get the 5.3 behaviour back by setting the
 default_charset ini directive to iso-8859-1. But, this is only true if
 they are forcing htmlspecialchars/htmlentities to check that setting
 with an empty string as the encoding arg. Most apps just do
 htmlspecialchars($str) and nothing else. Plus, it is really not a good
 idea to tie the internal encoding of data being passed to these
 functions to the output charset. You should be able to change the output
 charset without worrying about your runtime encoding at that level.

 What this effectively means is that we are asking people to go through
 their code and add an explicit charset to all htmlspecialchars() and
 htmlentities() calls. I think this will be a hurdle for 5.4 adoption.

 What we really need is what we added in PHP 6. A runtime encoding ini
 setting that is distinct from the output charset which we can use here.
 That would allow people to fix all their legacy code to a specific
 runtime encoding with a single ini setting instead of changing thousands
 of lines of code. I propose that we add such a directive to 5.4.1 to
 ease migration.
+1, especially for non-utf8 applications.

thanks

 See https://bugs.php.net/61354 for the first signs of grumbling about
 this one. As more people migrate I have a feeling this will end up being
 the most difficult part of the migration.

 -Rasmus

 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php




-- 
Laruence  Xinchen Hui
http://www.laruence.com/

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Adam Jon Richardson
On Mon, Mar 12, 2012 at 2:49 AM, Rasmus Lerdorf ras...@lerdorf.com wrote:

 What we really need is what we added in PHP 6. A runtime encoding ini
 setting that is distinct from the output charset which we can use here.
 That would allow people to fix all their legacy code to a specific
 runtime encoding with a single ini setting instead of changing thousands
 of lines of code. I propose that we add such a directive to 5.4.1 to
 ease migration.


This seems likes a very reasonable way of dealing with this issue.

Adam


Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Stas Malyshev

Hi!


What we really need is what we added in PHP 6. A runtime encoding ini
setting that is distinct from the output charset which we can use here.
That would allow people to fix all their legacy code to a specific
runtime encoding with a single ini setting instead of changing thousands
of lines of code. I propose that we add such a directive to 5.4.1 to
ease migration.


One more charset INI setting? I'm not sure I like this. We have tons of 
INIs already, and adding a new one each time we change something makes 
both writing applications and configuring servers harder.
But as the manual says, ISO-8859-1 and  UTF-8  are the same for 
htmlspecialchars() - is it wrong? If yes, what exactly is the different 
between old and new behavior? I tried to read #61354 but could make 
little sense out of it, it lacks expected result and I have hard time 
understanding what is the problem there. Could you explain?


--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Laruence
On Mon, Mar 12, 2012 at 3:10 PM, Stas Malyshev smalys...@sugarcrm.com wrote:
 Hi!


 What we really need is what we added in PHP 6. A runtime encoding ini
 setting that is distinct from the output charset which we can use here.
 That would allow people to fix all their legacy code to a specific
 runtime encoding with a single ini setting instead of changing thousands
 of lines of code. I propose that we add such a directive to 5.4.1 to
 ease migration.


 One more charset INI setting? I'm not sure I like this. We have tons of INIs
 already, and adding a new one each time we change something makes both
 writing applications and configuring servers harder.
 But as the manual says, ISO-8859-1 and  UTF-8  are the same for
 htmlspecialchars() - is it wrong? If yes, what exactly is the different
 between old and new behavior? I tried to read #61354 but could make little
 sense out of it, it lacks expected result and I have hard time understanding
 what is the problem there. Could you explain?
Hi:
   if the argument string passed to htmlspecialchars is not in the
charset the htmlspecialchars expected(default is UTF8, and there is
only one way out is specific the third argument),

   a empty string will returned without any notice or warning ;)

thanks

 --
 Stanislav Malyshev, Software Architect
 SugarCRM: http://www.sugarcrm.com/
 (408)454-6900 ext. 227


 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php




-- 
Laruence  Xinchen Hui
http://www.laruence.com/

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Laruence
On Mon, Mar 12, 2012 at 3:10 PM, Stas Malyshev smalys...@sugarcrm.com wrote:
 Hi!


 What we really need is what we added in PHP 6. A runtime encoding ini
 setting that is distinct from the output charset which we can use here.
 That would allow people to fix all their legacy code to a specific
 runtime encoding with a single ini setting instead of changing thousands
 of lines of code. I propose that we add such a directive to 5.4.1 to
 ease migration.


 One more charset INI setting? I'm not sure I like this. We have tons of INIs
If we will definitely add a run_time_charset in the furture, then I
think it's okey add it now. :)

thanks
 already, and adding a new one each time we change something makes both
 writing applications and configuring servers harder.
 But as the manual says, ISO-8859-1 and  UTF-8  are the same for
 htmlspecialchars() - is it wrong? If yes, what exactly is the different
 between old and new behavior? I tried to read #61354 but could make little
 sense out of it, it lacks expected result and I have hard time understanding
 what is the problem there. Could you explain?

 --
 Stanislav Malyshev, Software Architect
 SugarCRM: http://www.sugarcrm.com/
 (408)454-6900 ext. 227


 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php




-- 
Laruence  Xinchen Hui
http://www.laruence.com/

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Rasmus Lerdorf
On 03/12/2012 12:10 AM, Stas Malyshev wrote:
 Hi!
 
 What we really need is what we added in PHP 6. A runtime encoding ini
 setting that is distinct from the output charset which we can use here.
 That would allow people to fix all their legacy code to a specific
 runtime encoding with a single ini setting instead of changing thousands
 of lines of code. I propose that we add such a directive to 5.4.1 to
 ease migration.
 
 One more charset INI setting? I'm not sure I like this. We have tons of
 INIs already, and adding a new one each time we change something makes
 both writing applications and configuring servers harder.
 But as the manual says, ISO-8859-1 and  UTF-8  are the same for
 htmlspecialchars() - is it wrong? If yes, what exactly is the different
 between old and new behavior? I tried to read #61354 but could make
 little sense out of it, it lacks expected result and I have hard time
 understanding what is the problem there. Could you explain?

Yes, it is a bit hard to understand from the bug report because
bugs.php.net is all utf-8, but we are talking about non utf-8 apps here.

This script should illustrate it: ( https://gist.github.com/2020502 )

$gb2312 = iconv('UTF-8','GB2312','我是测试');
$string = $string = prep$gb2312/p/pre;
echo htmlspecialchars($string);

If you run that in PHP 5.3 you get:

lt;pregt;lt;pgt;���Dz���lt;/pgt;lt;/pregt;

The garbage-like chars there - if you don't see them, see
https://gist.github.com/2020442 - is the expected output. In PHP 5.4 the
output is nothing. The function recognizes that this is not valid UTF-8
and dumps the entire string.

Ignoring 5.4 for a second, if you in 5.3 do this:

echo htmlspecialchars($string);
echo htmlspecialchars($string, NULL, ISO-8859-1);
echo htmlspecialchars($string, NULL, UTF-8);

You will see that the first two output the escaped string with the
GB2312 bytes intact within it and the UTF-8 calls returns false because
it correctly recognizes that GB2312 is not UTF-8. We don't have any such
check for 8859-1, so yes, saying UTF-8 and 8859-1 are the same for
htmlspecialchars() is wrong for PHP 5.3 as well as for 5.4.

And as expected, under 5.4 because the default is now the UTF-8
behaviour only the second echo gives a result.

-Rasmus

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Rasmus Lerdorf
On 03/12/2012 12:41 AM, Rasmus Lerdorf wrote:

 $string = $string = prep$gb2312/p/pre;

Sorry typo there obviously. Just one $string

-Rasmus


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Stas Malyshev

Hi!


Ignoring 5.4 for a second, if you in 5.3 do this:

echo htmlspecialchars($string);
echo htmlspecialchars($string, NULL, ISO-8859-1);
echo htmlspecialchars($string, NULL, UTF-8);

You will see that the first two output the escaped string with the
GB2312 bytes intact within it and the UTF-8 calls returns false because
it correctly recognizes that GB2312 is not UTF-8. We don't have any such
check for 8859-1, so yes, saying UTF-8 and 8859-1 are the same for
htmlspecialchars() is wrong for PHP 5.3 as well as for 5.4.


So the difference is that ISO8859-1 does not validate but UTF-8 validates?
I'm not sure what GB2312 encoding does but isn't it dangerous to do 
htmlspecialchars() with wrong encoding? Wouldn't htmlentities() also 
produce wrong result when used with wrong encoding?


--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Adam Jon Richardson
On Mon, Mar 12, 2012 at 3:52 AM, Stas Malyshev smalys...@sugarcrm.comwrote:

 Hi!


  Ignoring 5.4 for a second, if you in 5.3 do this:

 echo htmlspecialchars($string);
 echo htmlspecialchars($string, NULL, ISO-8859-1);
 echo htmlspecialchars($string, NULL, UTF-8);

 You will see that the first two output the escaped string with the
 GB2312 bytes intact within it and the UTF-8 calls returns false because
 it correctly recognizes that GB2312 is not UTF-8. We don't have any such
 check for 8859-1, so yes, saying UTF-8 and 8859-1 are the same for
 htmlspecialchars() is wrong for PHP 5.3 as well as for 5.4.


 So the difference is that ISO8859-1 does not validate but UTF-8 validates?
 I'm not sure what GB2312 encoding does but isn't it dangerous to do
 htmlspecialchars() with wrong encoding? Wouldn't htmlentities() also
 produce wrong result when used with wrong encoding?


The EUC-CN encoding appears to ensure compatibility with ascii by avoiding
the ascii range for each of its two bytes, so it seems that
htmlspecialchars should work OK:

http://en.wikipedia.org/wiki/GB_2312#EUC-CN
http://php.net/manual/en/mbstring.supported-encodings.php

Adam

Adam


Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Rasmus Lerdorf
On 03/12/2012 12:52 AM, Stas Malyshev wrote:
 Hi!
 
 Ignoring 5.4 for a second, if you in 5.3 do this:

 echo htmlspecialchars($string);
 echo htmlspecialchars($string, NULL, ISO-8859-1);
 echo htmlspecialchars($string, NULL, UTF-8);

 You will see that the first two output the escaped string with the
 GB2312 bytes intact within it and the UTF-8 calls returns false because
 it correctly recognizes that GB2312 is not UTF-8. We don't have any such
 check for 8859-1, so yes, saying UTF-8 and 8859-1 are the same for
 htmlspecialchars() is wrong for PHP 5.3 as well as for 5.4.
 
 So the difference is that ISO8859-1 does not validate but UTF-8 validates?
 I'm not sure what GB2312 encoding does but isn't it dangerous to do
 htmlspecialchars() with wrong encoding? Wouldn't htmlentities() also
 produce wrong result when used with wrong encoding?

Not sure you can validate 8859-1 since it isn't multibyte, can you? Is
there any byte that is explicitly forbidden in 8859-1?

And yes, it may very well be dangerous to use the wrong charset and now
that we have better support for GB2312 and other asian charsets in the
entities functions in 5.4 it is even more prudent to choose the right
one so we should provide some way to help people get it right short of
changing every call.

Gustavo suggested we could use the multibyte encoding setting.
Unfortunately only zend.script_encoding is available and I think
internal_encoding is closer to what we need here, but that is only
available as mbstring.internal_encoding.

-Rasmus


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [POC - Patch] Scalar Type Hinting - A-La zend_parse_parameters

2012-03-12 Thread Arvids Godjuks
I should point out that returning false on param parsing failure on the
language level is one thing (not to mention it's not ok to do that in the
first place by my taste), but forcing that behavior on the user-land level
is kind'a too much.

Consider how the code will become much more complicated - now you have to
not only to check what you pass to the functions, but you have to check
what it returns every single time (do I have to mention that false can be
never returned by the function at all except when the param parsing fails?).

What is consistent and exists on the internal language layer
not necessarily good for the user-land. I'm kind'a surprised no one thought
of that.
As I said I can live with the throwing notices and warnings (and not
E_RECOVERABLE_ERROR as I personally wanted), but returning false even not
trying to run the function is just a bad idea all over the place.

2012/3/12 Anthony Ferrara ircmax...@gmail.com

 Ok, so it looks like we've had some decent conversation, but it has
 started to tail off a bit.  I'd normally draft an RFC at this point,
 but it seems there's still some contention on how exactly the
 implementation should work.

 Personally, if we're going to go for any form of strict checking
 (meaning not blind-conversion), I will not support these hint rules
 diverging from zend_parse_parameters (internal functions).  It just
 creates a new layer of inconvenience and confusion for not a whole lot
 of gain.  When I say divergence from ZPP, I'm talking about the same
 behavior when ZPP returns SUCCESS, and a E_RECOVERABLE_ERROR when ZPP
 returns FAILURE...

 Now, with that said, I'd be all for making sane changes to ZPP to
 bring both inline with a common goal.  Think that passing 1abc to an
 int type hinted parameter (which currently raises a notice) is
 unacceptable?  Then my opinion is that it should be tightened in both
 places at the same time.  But they should stay connected as closely as
 possible for consistency...

 So, with that said, let me ask this question:  What needs to change
 from the current POC before it can be formalized into an RFC?  Do we
 need to tighten the conversions?  Or are they OK as-is?

 Thoughts?

 Anthony

 On Sat, Mar 10, 2012 at 2:45 AM, Tjerk Meesters
 tjerk.meest...@gmail.com wrote:
 
  On 9 Mar, 2012, at 11:20 PM, Lazare Inepologlou linep...@gmail.com
 wrote:
 
  Type casting combined with passing by reference is problematic in many
  ways. Just an example:
 
  fuction foo( string  $buffer) { ... }
  foo( $my_buffer );
 
  Here, $my_buffer has just been declared, so it is null. Should this be
 an
  error? I don't know! So, I think that that passing by reference should
 not
  be (immediately) supported.
 
 
  Strictly speaking, if you add a type to a referenced variable in that
 way it's only logical that you expect it to have a proper value when the
 function is called. After all, it's not an output type declaration :)
 

 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Yasuo Ohgaki
Hi

I think following PHP 5.4.0 NEWS entry is misleading.

  . Changed default value of default_charset php.ini option from ISO-8859-1 to
UTF-8. (Rasmus)

I thought default_charset became UTF-8, so I was expecting
following HTTP header.

content-typetext/html; charset=UTF-8

However, I got empty charset (missing 'charset=UTF-8').
So I looked up to source and found the line in SAPI.h

293 #define SAPI_DEFAULT_CHARSET

Empty string should be UTF-8, isn't it?

BTW, empty charset in HTTP header does not mean the default will
be ISO-8859-1, but it let browser guess the encoding is used.
Guessing encoding may cause XSS under certain conditions.


Anyway, I was curious so I've checked ext/standard/html.c and found

/* {{{ entity_charset determine_charset
 * returns the charset identifier based on current locale or a hint.
 * defaults to UTF-8 */
static enum entity_charset determine_charset(char *charset_hint TSRMLS_DC)
{
int i;
enum entity_charset charset = cs_utf_8;
int len = 0;
const zend_encoding *zenc;

/* Default is now UTF-8 */
if (charset_hint == NULL)
return cs_utf_8;


There are 2 problems.

 - php.ini's default_charset should be UTF-8.
 - determine_charset() should not blindly default to UTF-8 when there
are no hint.

Old htmlentities/htmlspecialchars actually determines charset from
default_charset/mbstring.internal_encoding/etc. I think old behavior
is better than now.

How about make determine_charset() behaves like 5.3 and set the
SAPI_DEFAULT_CHARSET to UTF-8?

Then PHP will behave like as NEWS mentions, htmlentities/htmlspecialchars
default encoding became 'UTF-8' and users will have control for default
htmlenties/htmlspecialchars encoding.

Regards,

--
Yasuo Ohgaki
yohg...@ohgaki.net

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Yasuo Ohgaki
Hi,

I think motivation of

   /* Default is now UTF-8 */
   if (charset_hint == NULL)
   return cs_utf_8;

is for better performance and I think it's good for better performance.
Alternative of my suggestion is introduce new php.ini entry as Rusmus
mentioned.

The name may be default_html_escape_encoding?

We should document this behavior very well, since it affects all of
non UTF-8 web sites.

Regards,

--
Yasuo Ohgaki
yohg...@ohgaki.net

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [POC - Patch] Scalar Type Hinting - A-La zend_parse_parameters

2012-03-12 Thread Anthony Ferrara
Arvids,

On Mon, Mar 12, 2012 at 4:39 AM, Arvids Godjuks
arvids.godj...@gmail.com wrote:
 I should point out that returning false on param parsing failure on the
 language level is one thing (not to mention it's not ok to do that in the
 first place by my taste), but forcing that behavior on the user-land level
 is kind'a too much.

To be clear, that's not what I had meant at all.  I was talking about
ZPP returning false internally, not what the internal functions
themselves do (it's up to them to ignore the error, to go on, or raise
a different error).

 Consider how the code will become much more complicated - now you have to
 not only to check what you pass to the functions, but you have to check what
 it returns every single time (do I have to mention that false can be never
 returned by the function at all except when the param parsing fails?).

I agree 100%.  There's also a semantic difference between an error
state from the function and an error state from parameter parsing.
Which is why an E_RECOVERABLE_ERROR is my preferred state, since it
communicates the information properly...

 What is consistent and exists on the internal language layer
 not necessarily good for the user-land. I'm kind'a surprised no one thought
 of that.
 As I said I can live with the throwing notices and warnings (and not
 E_RECOVERABLE_ERROR as I personally wanted), but returning false even not
 trying to run the function is just a bad idea all over the place.

I'm confused.  Do you not want E_RECOVERABLE_ERROR for parameter
failures?  Or do you, but could live with lesser as well?  I didn't
quite get that part...

Anthony

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [POC - Patch] Scalar Type Hinting - A-La zend_parse_parameters

2012-03-12 Thread Arvids Godjuks


  What is consistent and exists on the internal language layer
  not necessarily good for the user-land. I'm kind'a surprised no one
 thought
  of that.
  As I said I can live with the throwing notices and warnings (and not
  E_RECOVERABLE_ERROR as I personally wanted), but returning false even not
  trying to run the function is just a bad idea all over the place.

 I'm confused.  Do you not want E_RECOVERABLE_ERROR for parameter
 failures?  Or do you, but could live with lesser as well?  I didn't
 quite get that part...

 Anthony

Hi Anthony.

Yea, that part looks confusing.
What I wanted to say is that I would like to get E_RECOVERABLE_ERROR and I
was voicing my opinion on that earlier in the threads. But I could live
with E_WARNING and E_NOTICE if community decides it to be less strict - I
will clean up my code not to throw a single notice (and because I use Yii -
it's by default converts any E_* raised to a fatal error and throws HTTP
500 error via exceptions).

In my 8 years of active PHP development I learned that some strictness in
deep core code of the project is a good thing and erroring the hell out
there makes perfect sense. It's a delicate balance and I never apply it to
the level that does actual communication with the outside world.


Re: [PHP-DEV] [POC - Patch] Scalar Type Hinting - A-La zend_parse_parameters

2012-03-12 Thread Anthony Ferrara
Arvids,

 Yea, that part looks confusing.
 What I wanted to say is that I would like to get E_RECOVERABLE_ERROR and I
 was voicing my opinion on that earlier in the threads. But I could live with
 E_WARNING and E_NOTICE if community decides it to be less strict - I will
 clean up my code not to throw a single notice (and because I use Yii - it's
 by default converts any E_* raised to a fatal error and throws HTTP 500
 error via exceptions).

 In my 8 years of active PHP development I learned that some strictness in
 deep core code of the project is a good thing and erroring the hell out
 there makes perfect sense. It's a delicate balance and I never apply it to
 the level that does actual communication with the outside world.

Ok, I agree 100%.  I was just confused about your wording and wanted
to clarify it to the list.  So we're on the same page here.

Thanks!

Anthony

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [POC - Patch] Scalar Type Hinting - A-La zend_parse_parameters

2012-03-12 Thread Lazare Inepologlou
Hello Anthony,

I will raise once again the question about accepting null. According to
your POC, null is an acceptable value if it is also declared as a default
value. This is problematic for the scalar types, because they can very well
have a different default value.

An example: There is a check box with three states (check, unchecked and
mixed). This is usually translated to a three state boolean (true, false
and null). The default value of the check box is false.

function set_check_box_state( bool state = false ) { ... }
set_check_box_state( null );  // null will be converted to false here...

Therefore, this cannot work, unless the default value becomes null, which
is against the requirements. What I suggest is something like this:

function set_check_box_state( bool? state = false ) { ... }
set_check_box_state( null );  // works fine

In my opinion this is much clearer, as it separates the notions of the type
and that of the default value.


Lazare INEPOLOGLOU
Ingénieur Logiciel


Re: [PHP-DEV] [POC - Patch] Scalar Type Hinting - A-La zend_parse_parameters

2012-03-12 Thread Simon Schick
2012/3/12 Lazare Inepologlou linep...@gmail.com

 function set_check_box_state( bool state = false ) { ... }
 set_check_box_state( null );  // null will be converted to false here...

 Therefore, this cannot work, unless the default value becomes null, which
 is against the requirements. What I suggest is something like this:

 function set_check_box_state( bool? state = false ) { ... }
 set_check_box_state( null );  // works fine

 In my opinion this is much clearer, as it separates the notions of the
 type
 and that of the default value.


 Lazare INEPOLOGLOU
 Ingénieur Logiciel

Hi Lazare,

I'd like to keep the accptance of null as it is for classes and arrays.
Here's an example I wrote earlier:

function foo(array $d = array()) { var_dump($d); }
foo(null); // This fails with the message: Argument 1 passed to foo()
must be an array, null given

As this code fails I'd not expect to change this behavior for the new
feature we're discussing here.

function foo(int $d = 20) { var_dump($d); }
foo(null); // This should then also simply fail. Don't care about
what's the default-value or defined type.

function foo(int $d = null) { var_dump($d); }
foo(null); // And this should pass it through, providing the
NULL-value in the function.

function foo(int $d = 20) { var_dump($d); }
foo( (int)null ); // This can provide 0 as the programmer forcing it
to be an integer before putting it into this function-call.

I would personally not like to give the user the option to set a
null-value if it's not the default.
But .. I don't wanna screw up your idea.

Bye
Simon

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Laruence
On Mon, Mar 12, 2012 at 6:21 PM, Yasuo Ohgaki yohg...@ohgaki.net wrote:
 Hi,

 I think motivation of

       /* Default is now UTF-8 */
       if (charset_hint == NULL)
               return cs_utf_8;

 is for better performance and I think it's good for better performance.
 Alternative of my suggestion is introduce new php.ini entry as Rusmus
 mentioned.

 The name may be default_html_escape_encoding?
Hi:
   in consideration of succinctness,  I think run_time_encoding is better.

   and we should also separate the determine_output_charset and
determine_run_time_charset(there is only one determin_charset now)

thanks

 We should document this behavior very well, since it affects all of
 non UTF-8 web sites.

 Regards,

 --
 Yasuo Ohgaki
 yohg...@ohgaki.net

 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php




-- 
Laruence  Xinchen Hui
http://www.laruence.com/

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [POC - Patch] Scalar Type Hinting - A-La zend_parse_parameters

2012-03-12 Thread Arvids Godjuks
I think that the null issue is not an issue. Strictly speaking if you
want null or an int - leave out the type hint and use generic argument that
will accept anything.
I think it's over-engineering to try and push a special treatment for the
null. If function/method argument accepts anything but a single type -
it's type-less and does not need a type hint.

Developers should not abuse type hints and adding a special case for
handling null will make many start to request things like this:
function foo(string|array $data)
function foo(bool|int $flag)
function foo(mixed $someVar)
etc.

I'm not sure about you, but I don't wanna see that kind of thing eventually
making it's way into the language (believe me - even I considered that at
some point, but i'm more mature now and more settled in my wishes :))


Re: [PHP-DEV] [POC - Patch] Scalar Type Hinting - A-La zend_parse_parameters

2012-03-12 Thread Lazare Inepologlou
Hello Simon,

First of all, none of your examples cover the case I mentioned, and so, my
concerns are still valid.

Secondly, you make some wrong assumptions about how this specific POC
works. For example, you write:

 function foo(int $d = 20) { var_dump($d); }
 foo(null); // This should then also simply fail.

Unless I am wrong, the patch will convert null to 0.


Lazare INEPOLOGLOU
Ingénieur Logiciel


2012/3/12 Simon Schick simonsimc...@googlemail.com

 2012/3/12 Lazare Inepologlou linep...@gmail.com
 
  function set_check_box_state( bool state = false ) { ... }
  set_check_box_state( null );  // null will be converted to false here...
 
  Therefore, this cannot work, unless the default value becomes null, which
  is against the requirements. What I suggest is something like this:
 
  function set_check_box_state( bool? state = false ) { ... }
  set_check_box_state( null );  // works fine
 
  In my opinion this is much clearer, as it separates the notions of the
  type
  and that of the default value.
 
 
  Lazare INEPOLOGLOU
  Ingénieur Logiciel

 Hi Lazare,

 I'd like to keep the accptance of null as it is for classes and arrays.
 Here's an example I wrote earlier:

 function foo(array $d = array()) { var_dump($d); }
 foo(null); // This fails with the message: Argument 1 passed to foo()
 must be an array, null given

 As this code fails I'd not expect to change this behavior for the new
 feature we're discussing here.

 function foo(int $d = 20) { var_dump($d); }
 foo(null); // This should then also simply fail. Don't care about
 what's the default-value or defined type.

 function foo(int $d = null) { var_dump($d); }
 foo(null); // And this should pass it through, providing the
 NULL-value in the function.

 function foo(int $d = 20) { var_dump($d); }
 foo( (int)null ); // This can provide 0 as the programmer forcing it
 to be an integer before putting it into this function-call.

 I would personally not like to give the user the option to set a
 null-value if it's not the default.
 But .. I don't wanna screw up your idea.

 Bye
 Simon



Re: [PHP-DEV] [POC - Patch] Scalar Type Hinting - A-La zend_parse_parameters

2012-03-12 Thread Lazare Inepologlou
 I'm not sure about you, but I don't wanna see that kind of thing
eventually making it's way into the language

Me neither. All I am saying is that, since int|null is already here from
the back door, I think it should be properly supported.


Lazare INEPOLOGLOU
Ingénieur Logiciel


2012/3/12 Arvids Godjuks arvids.godj...@gmail.com

 I think that the null issue is not an issue. Strictly speaking if you
 want null or an int - leave out the type hint and use generic argument that
 will accept anything.
 I think it's over-engineering to try and push a special treatment for the
 null. If function/method argument accepts anything but a single type -
 it's type-less and does not need a type hint.

 Developers should not abuse type hints and adding a special case for
 handling null will make many start to request things like this:
 function foo(string|array $data)
 function foo(bool|int $flag)
 function foo(mixed $someVar)
 etc.

 I'm not sure about you, but I don't wanna see that kind of thing
 eventually making it's way into the language (believe me - even I
 considered that at some point, but i'm more mature now and more settled in
 my wishes :))



Re: [PHP-DEV] [POC - Patch] Scalar Type Hinting - A-La zend_parse_parameters

2012-03-12 Thread Arvids Godjuks
2012/3/12 Lazare Inepologlou linep...@gmail.com

  I'm not sure about you, but I don't wanna see that kind of thing
 eventually making it's way into the language

 Me neither. All I am saying is that, since int|null is already here from
 the back door, I think it should be properly supported.


There is no int|null at the moment, and should not be. You can pass
anything  - object, array, string, bool, int, float resource, callable -
they all are accepted and are checked in function body if it's writer wrote
that code.
Hint should provide a hint for a single type, or hint doesn't belong there.


Re: [PHP-DEV] [POC - Patch] Scalar Type Hinting - A-La zend_parse_parameters

2012-03-12 Thread Anthony Ferrara
Lazare,

 The patch of Anthony, clearly states that this is accepted:

 function foo ( int $bar = null ) { }

 And this is what I called an int|null.

Yup, it does.  Because that's the current behavior with array and
object casting.  If you default it to null in the declaration, null is
a valid value.  If you don't, it's not...

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [POC - Patch] Scalar Type Hinting - A-La zend_parse_parameters

2012-03-12 Thread Lazare Inepologlou
Hello Arvids,

The patch of Anthony, clearly states that this is accepted:

function foo ( int $bar = null ) { }

And this is what I called an int|null.


Lazare INEPOLOGLOU
Ingénieur Logiciel


2012/3/12 Arvids Godjuks arvids.godj...@gmail.com

 2012/3/12 Lazare Inepologlou linep...@gmail.com

  I'm not sure about you, but I don't wanna see that kind of thing
 eventually making it's way into the language

 Me neither. All I am saying is that, since int|null is already here from
 the back door, I think it should be properly supported.


 There is no int|null at the moment, and should not be. You can pass
 anything  - object, array, string, bool, int, float resource, callable -
 they all are accepted and are checked in function body if it's writer wrote
 that code.
 Hint should provide a hint for a single type, or hint doesn't belong there.



Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Rasmus Lerdorf
On 03/12/2012 03:05 AM, Yasuo Ohgaki wrote:
 Hi
 
 I think following PHP 5.4.0 NEWS entry is misleading.
 
   . Changed default value of default_charset php.ini option from ISO-8859-1 
 to
 UTF-8. (Rasmus)

Yes, I have fixed that now.

 I thought default_charset became UTF-8, so I was expecting
 following HTTP header.
 
 content-type  text/html; charset=UTF-8
 
 However, I got empty charset (missing 'charset=UTF-8').
 So I looked up to source and found the line in SAPI.h
 
 293   #define SAPI_DEFAULT_CHARSET
 
 Empty string should be UTF-8, isn't it?

No, we can't force an output charset on people since it would end up
breaking a lot of sites.

  - php.ini's default_charset should be UTF-8.
  - determine_charset() should not blindly default to UTF-8 when there
 are no hint.
 
 Old htmlentities/htmlspecialchars actually determines charset from
 default_charset/mbstring.internal_encoding/etc. I think old behavior
 is better than now.
 
 How about make determine_charset() behaves like 5.3 and set the
 SAPI_DEFAULT_CHARSET to UTF-8?

PHP 5.3's determine_charset behaves exactly like 5.4's. In 5.3 we have:

if (charset_hint == NULL)
return cs_8859_1;

and in 5.4 we have:

if (charset_hint == NULL)
return cs_utf_8;

So there is no difference in their guessing when there is no hint, the
only difference is that in 5.4 we choose utf8 and in 5.3 we choose
8859-1 in that case.

-Rasmus

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Michael Stowe
I think the ini directive, while adding another to the list, may be the
most unobtrusive method to address this issue, at least for developers.

I definitely agree with Rasmus that this could be one of the bigger
headaches in transitioning to 5.4 (for non-UTF8 sites) and unless we can
come up with a better solution, I say let's move forward with it for 5.4.1.

- Mike





On Mon, Mar 12, 2012 at 10:27 AM, Rasmus Lerdorf ras...@lerdorf.com wrote:

 On 03/12/2012 03:05 AM, Yasuo Ohgaki wrote:
  Hi
 
  I think following PHP 5.4.0 NEWS entry is misleading.
 
. Changed default value of default_charset php.ini option from
 ISO-8859-1 to
  UTF-8. (Rasmus)

 Yes, I have fixed that now.

  I thought default_charset became UTF-8, so I was expecting
  following HTTP header.
 
  content-type  text/html; charset=UTF-8
 
  However, I got empty charset (missing 'charset=UTF-8').
  So I looked up to source and found the line in SAPI.h
 
  293   #define SAPI_DEFAULT_CHARSET
 
  Empty string should be UTF-8, isn't it?

 No, we can't force an output charset on people since it would end up
 breaking a lot of sites.

   - php.ini's default_charset should be UTF-8.
   - determine_charset() should not blindly default to UTF-8 when there
  are no hint.
 
  Old htmlentities/htmlspecialchars actually determines charset from
  default_charset/mbstring.internal_encoding/etc. I think old behavior
  is better than now.
 
  How about make determine_charset() behaves like 5.3 and set the
  SAPI_DEFAULT_CHARSET to UTF-8?

 PHP 5.3's determine_charset behaves exactly like 5.4's. In 5.3 we have:

if (charset_hint == NULL)
return cs_8859_1;

 and in 5.4 we have:

if (charset_hint == NULL)
return cs_utf_8;

 So there is no difference in their guessing when there is no hint, the
 only difference is that in 5.4 we choose utf8 and in 5.3 we choose
 8859-1 in that case.

 -Rasmus

 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php




-- 
---

My command is this: Love each other as I
have loved you. John 15:12

---


Re: [PHP-DEV] [POC - Patch] Scalar Type Hinting - A-La zend_parse_parameters

2012-03-12 Thread Lazare Inepologlou
Thank you for the confirmation.

What I am saying here is that, although this behavior was fine for objects,
it is not enough for scalars. One of the main arguments in favor of the
adoption of this syntax was that null was the only possible default value
for objects anyway. This obviously is not the case with for scalar types.
This is why I suggest a different syntax (which can also be used by object
types for consistency).

Lazare INEPOLOGLOU
Ingénieur Logiciel


2012/3/12 Anthony Ferrara ircmax...@gmail.com

 Lazare,

  The patch of Anthony, clearly states that this is accepted:
 
  function foo ( int $bar = null ) { }
 
  And this is what I called an int|null.

 Yup, it does.  Because that's the current behavior with array and
 object casting.  If you default it to null in the declaration, null is
 a valid value.  If you don't, it's not...



Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Richard Lynch
On Mon, March 12, 2012 1:49 am, Rasmus Lerdorf wrote:
 What we really need is what we added in PHP 6. A runtime encoding ini
 setting that is distinct from the output charset which we can use
 here.

The usual argument against another php.ini setting, other than too
many already is the difficulty it presents to write portable code
libraries.

I'm not smart enough to predict how such a setting (regardless of its
name) would help or hinder a library of code that doesn't want another
conditional in a zillion places.

But you folks are that smart. :-)

And I haven't seen any discussion regarding this sub-issue.

So, how would the help / hinder authors of generic library code to be
distributed in the wild?

Forgive me if the answer is so blindingly obvious I should already
know it... :-)

-- 
brain cancer update:
http://richardlynch.blogspot.com/search/label/brain%20tumor
Donate:
https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclickhosted_button_id=FS9NLTNEEKWBE



-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] CURL file posting

2012-03-12 Thread Richard Lynch
On Sun, March 11, 2012 6:29 pm, Stas Malyshev wrote:
 Hi!

 I'd sure like a PHP extension that didn't have this obvious and
 nasty bug:

 https://bugs.php.net/bug.php?id=46439

 This doesn't look good. Documentation does say the @ prefix exists,
 but
 it has very high potential of creating security holes for unsuspecting
 people. open_basedir would help limit the impact, but still it's not a
 good thing. Any ideas on fixing it without breaking the BC?

Ouch.

Issue an E_NOTICE when it happens?

Add a new CURLOPT_FILEFIELDS that takes an array of the parameters
that are supposed to be files, so the ones that are expected to have
@... do not fire the E_NOTICE.

Issuing E_NOTICE is a BC, I suppose, but you'd think people would
appreciate an alert about a potential security threat...

-- 
brain cancer update:
http://richardlynch.blogspot.com/search/label/brain%20tumor
Donate:
https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclickhosted_button_id=FS9NLTNEEKWBE



-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [POC - Patch] Scalar Type Hinting - A-La zend_parse_parameters

2012-03-12 Thread Richard Lynch
On Fri, March 9, 2012 2:51 am, Nikita Popov wrote:
 On Fri, Mar 9, 2012 at 3:58 AM, Ilia Alshanetsky i...@prohost.org
 wrote:
 Anthony,

 My concern with this type of patch is that what you are proposing
 are
 not really hints, they are forced casts. As such they modify the
 data
 potentially leading to data loss.
 This patch specifically tries to overcome this problem of the previous
 version. It will not accept input which will lead to a data loss on
 cast. The only exception is passing 123abc to an int hint, which
 will cast to 123 and throw a notice. This is also my only point of
 critique: I'd prefer to be stricter here and go all the way to a
 recoverable fatal error.

So what happens to (int) 1233553463645747675685685

Does it cast and then cause an overflow, which PHP pretty much ignores
and wraps to a negative number?

Or does it error out as you can't convert without mangling the data?

Will it behave differently on 32-bit versus 64-bit hardware for values
that are in-range of 64 but no 32?

-- 
brain cancer update:
http://richardlynch.blogspot.com/search/label/brain%20tumor
Donate:
https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclickhosted_button_id=FS9NLTNEEKWBE



-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Rasmus Lerdorf
On 03/12/2012 12:40 PM, Stas Malyshev wrote:
 Hi!
 
 And yes, it may very well be dangerous to use the wrong charset and now
 that we have better support for GB2312 and other asian charsets in the
 entities functions in 5.4 it is even more prudent to choose the right
 one so we should provide some way to help people get it right short of
 changing every call.
 
 I'm not sure changing every call is such a big problem - it's one grep
 and one replace, can be done in one line of sed/awk/perl/php probably.
 But a bigger issue is here that people insist on using wrong charsets
 and expect language to have some magical external defaults that work for
 exactly their use case, instead of doing what they should be doing all
 along - putting charset right there in the argument.
 We need to get people off this mindset fast, since it is not a good one.
 Having tons of hidden defaults that modify behavior of functions called
 with the same arguments in hundreds of different ways is a coding and
 maintenance nightmare. Now if I write htmlspecialchars() I can never be
 sure if works right and uses UTF-8 - what if somebody messed with the
 INI setting because of some other broken library that required that to
 work?

But you can't necessarily hardcode the encoding if you are writing
portable code. That's a bit like hardcoding a timezone. In order to
write portable code you need to give people the ability to localize it.

-Rasmus

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] [POC - Patch] Scalar Type Hinting - A-La zend_parse_parameters

2012-03-12 Thread Richard Lynch
On Fri, March 9, 2012 5:58 pm, John Crenshaw wrote:
 The reason you have to validate the input type in this case is because
 even though it is a reference, we don't ACTALLY know that it isn't
 supposed to contain an input (even though that would be against all
 sane rules most of the time).

Last time I checked, two consecutive exec calls with the same second
argument would append to the array of outputs.

Hey, it's even documented that way:
http://www.php.net/manual/en/function.exec.php

It was unexpected when I first saw it, but seemed perfectly sane to
me, as I suppose somebody might want it, and unset($output); wasn't
exactly horrible to add before each exec call.

It would be wise to check other PHP function with references returned
to sanity check your definition of sane :-)

-- 
brain cancer update:
http://richardlynch.blogspot.com/search/label/brain%20tumor
Donate:
https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclickhosted_button_id=FS9NLTNEEKWBE



-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Stas Malyshev

Hi!


But you can't necessarily hardcode the encoding if you are writing
portable code. That's a bit like hardcoding a timezone. In order to
write portable code you need to give people the ability to localize it.


No, it's not like timezone at all. I have to support all timezones in a 
global app, but I don't have to internally support every encoding on 
Earth - having everything internally in UTF-8 works quite well, and a 
lot of applications do exactly that - they have everything internally in 
UTF-8 and only may convert when importing or exporting the data. I don't 
see anything in using UTF-8 throughout the app/library that makes it 
non-portable. However, if we allow to change defaults in 
htmlspecialchars() etc. that essentially makes having defaults useless 
as I'd have so explicitly specify UTF-8 each time - otherwise it's a 
gamble what encoding I'd actually get.

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] any blogs?

2012-03-12 Thread Richard Lynch
I can't recommend any blogs, per se, but Sara's book or even her
articles on Zend.com as well as the php.net manual about internals at
the end are a must read for understanding the internals...

On Thu, March 8, 2012 6:22 am, adit adit wrote:
 Let's try to stick only to the internals blogs, ok? If any other php
 core
 devs have some blogs..
 I found also Sara Golemon's blog but is discontinued for some years
 now.



 On Thu, Mar 8, 2012 at 1:09 PM, Peter Beverloo pe...@lvp-media.com
 wrote:

 There is a Planet PHP which aggregates many blogs articles written
 by
 contributors:
 http://planet-php.net/

 Peter


 On Thu, Mar 8, 2012 at 09:58, adit adit miche...@gmail.com wrote:

 Hi,

 Can you tell me which one of you guys has any blogs on which i can
 read
 about the php internals?
 I've already subscribed to laruence's , problem is google translate
 is
 pretty bad at translating chinese

 Thanks,






-- 
brain cancer update:
http://richardlynch.blogspot.com/search/label/brain%20tumor
Donate:
https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclickhosted_button_id=FS9NLTNEEKWBE



-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Rasmus Lerdorf
On 03/12/2012 12:51 PM, Stas Malyshev wrote:
 Hi!
 
 But you can't necessarily hardcode the encoding if you are writing
 portable code. That's a bit like hardcoding a timezone. In order to
 write portable code you need to give people the ability to localize it.
 
 No, it's not like timezone at all. I have to support all timezones in a
 global app, but I don't have to internally support every encoding on
 Earth - having everything internally in UTF-8 works quite well, and a
 lot of applications do exactly that - they have everything internally in
 UTF-8 and only may convert when importing or exporting the data. I don't
 see anything in using UTF-8 throughout the app/library that makes it
 non-portable. However, if we allow to change defaults in
 htmlspecialchars() etc. that essentially makes having defaults useless
 as I'd have so explicitly specify UTF-8 each time - otherwise it's a
 gamble what encoding I'd actually get.

If everything was UTF-8 we wouldn't have any of these issues.
Unfortunately that isn't the case. The question is what to do with apps
that need to deal with non UTF-8 data. Are we going to provide any help
to them beyond just telling them to convert everything to UTF-8?

We took steps in 5.4 to improve htmlspecialchars to understand more
encodings and we have the concept of script_encoding and
internal_encoding that is used both in the engine and in mbstring.
Currently internal_encoding isn't checked by htmlspecialchars. If you
pass it '' it checks script_encoding and default_charset which is a bit
odd since neither directly relate to the encoding of the internal data
you are feeding to it. So maybe a way to tackle this is to use the
mbstring internal encoding when it is set as the htmlspecialchars
default when it is called without an encoding arg.

-Rasmus

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Scalar Type Hinting

2012-03-12 Thread Richard Lynch
On Thu, March 8, 2012 5:13 am, Alain Williams wrote:
 On Thu, Mar 08, 2012 at 11:06:56AM +0200, Arvids Godjuks wrote:
  Type hints are meant to
  filter input from external sources

 Correction, it should read like this:
 Type hints are _not_ meant to filter input from external sources

 +1

 What they will do is to catch where input from external sources has
 NOT been
 correctly filtered -- but that should be a rare event and indicative
 of a bug.

While everybody here routinely filters all input, you're living in a
dream world if you think un-filtered data is a rare event.

It's still a bug, but definitely not rare.

Or perhaps you meant that should be a rare event if we want all PHP
apps to be well-written rather than that should be a rare event in
terms of BC

-- 
brain cancer update:
http://richardlynch.blogspot.com/search/label/brain%20tumor
Donate:
https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclickhosted_button_id=FS9NLTNEEKWBE



-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Pierre Joye
hi Rasmus,

On Mon, Mar 12, 2012 at 9:12 PM, Rasmus Lerdorf ras...@lerdorf.com wrote:

 If everything was UTF-8 we wouldn't have any of these issues.
 Unfortunately that isn't the case. The question is what to do with apps
 that need to deal with non UTF-8 data. Are we going to provide any help
 to them beyond just telling them to convert everything to UTF-8?

That's not really an acceptable solution, obviously.

 We took steps in 5.4 to improve htmlspecialchars to understand more
 encodings and we have the concept of script_encoding and
 internal_encoding that is used both in the engine and in mbstring.

 Currently internal_encoding isn't checked by htmlspecialchars. If you
 pass it '' it checks script_encoding and default_charset which is a bit
 odd since neither directly relate to the encoding of the internal data
 you are feeding to it. So maybe a way to tackle this is to use the
 mbstring internal encoding when it is set as the htmlspecialchars
 default when it is called without an encoding arg.

That's why I would prefer to use an existing setting and clearly
document it instead of creating a new ini settings with a totally
different impact than the existing ones. Not sure which one would fit
best tho'.

Reading these last two paragraphs gave me a headache and I did not
know anymore which encoding we were talking about ;-)

Cheers,
-- 
Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Providing sandboxed versions of include and require language constructs

2012-03-12 Thread Richard Lynch
On Tue, March 6, 2012 3:30 am, Florian Anderiasch wrote:

Security by blacklist almost always isn't security...

You're bound to miss one of the functions you should have blacklisted,
 but didn't.

Something like Drupal would be crippled by this because major
extensions used by all rely on access that would probably want to be
blocked.

So then they'd have to come up with a blessed list of extension to
not block, and then...

Nice idea, in the abstract, but I don't think it will work out to be
very useful in the Real World (tm).

-- 
brain cancer update:
http://richardlynch.blogspot.com/search/label/brain%20tumor
Donate:
https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclickhosted_button_id=FS9NLTNEEKWBE



-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Ángel González
On 12/03/12 20:51, Stas Malyshev wrote:
 Hi!

 But you can't necessarily hardcode the encoding if you are writing
 portable code. That's a bit like hardcoding a timezone. In order to
 write portable code you need to give people the ability to localize it.

 No, it's not like timezone at all. I have to support all timezones in
 a global app, but I don't have to internally support every encoding on
 Earth - having everything internally in UTF-8 works quite well, and a
 lot of applications do exactly that - they have everything internally
 in UTF-8 and only may convert when importing or exporting the data. I
 don't see anything in using UTF-8 throughout the app/library that
 makes it non-portable. However, if we allow to change defaults in
 htmlspecialchars() etc. that essentially makes having defaults useless
 as I'd have so explicitly specify UTF-8 each time - otherwise it's a
 gamble what encoding I'd actually get.
If you are a framework developer, and really want to shield against a
bad php.ini setting, you could ini_set() to your prefered charset at the
beginning of the request.


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Stas Malyshev

Hi!



If you are a framework developer, and really want to shield against a
bad php.ini setting, you could ini_set() to your prefered charset at the
beginning of the request.


That assuming the request is completely processed by your framework 
and you never call any outside code and any outside code never calls you 
- otherwise your messing with INI setting may very well break that code 
or that code's messing with INI settings may very well break yours.

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] [POC - Patch] Scalar Type Hinting - A-La zend_parse_parameters

2012-03-12 Thread Matthew Weier O'Phinney
On 2012-03-12, Arvids Godjuks arvids.godj...@gmail.com wrote:
 --f46d0442880e02b97f04bb0b432b
 Content-Type: text/plain; charset=UTF-8

 I think that the null issue is not an issue. Strictly speaking if you
 want null or an int - leave out the type hint and use generic argument that
 will accept anything.
 I think it's over-engineering to try and push a special treatment for the
 null. If function/method argument accepts anything but a single type -
 it's type-less and does not need a type hint.

However, that conflicts with how typehints work currently in PHP:

public function setContainer(Container $container = null)
{
$this-container = $container;
}

This is perfectly valid currently, and allows unsetting a value
easily. I'd expect scalar hints to work exactly the same way -- in other
words, null, or a value that satisfies the hint.


-- 
Matthew Weier O'Phinney
Project Lead| matt...@zend.com
Zend Framework  | http://framework.zend.com/
PGP key: http://framework.zend.com/zf-matthew-pgp-key.asc

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] Release process nit

2012-03-12 Thread Ondřej Surý
Hi guys,

could you please remove this cruft:

dpkg-source: warning: ignoring deletion of file
ext/standard/var_unserializer.c.orig
dpkg-source: warning: ignoring deletion of file
ext/standard/url_scanner_ex.c.orig
dpkg-source: warning: ignoring deletion of file ext/date/lib/parse_date.c.orig
dpkg-source: warning: ignoring deletion of file ext/pdo/pdo_sql_parser.c.orig
dpkg-source: warning: ignoring deletion of directory autom4te.cache
dpkg-source: warning: ignoring deletion of file autom4te.cache/requests
dpkg-source: warning: ignoring deletion of file autom4te.cache/output.0
dpkg-source: warning: ignoring deletion of file autom4te.cache/traces.0

from next release(s).

Thanks,
-- 
Ondřej Surý ond...@sury.org

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] CURL file posting

2012-03-12 Thread Ángel González
On 12/03/12 20:36, Richard Lynch wrote:
 On Sun, March 11, 2012 6:29 pm, Stas Malyshev wrote:
 This doesn't look good. Documentation does say the @ prefix exists,
 but
 it has very high potential of creating security holes for unsuspecting
 people. open_basedir would help limit the impact, but still it's not a
 good thing. Any ideas on fixing it without breaking the BC?
 Ouch.

 Issue an E_NOTICE when it happens?

 Add a new CURLOPT_FILEFIELDS that takes an array of the parameters
 that are supposed to be files, so the ones that are expected to have
 @... do not fire the E_NOTICE.

 Issuing E_NOTICE is a BC, I suppose, but you'd think people would
 appreciate an alert about a potential security threat...
That would only trigger the notice when you transfer data beginning with
an @,
which would end up being only when finally attacked.

I'd make it need another option to make @ options work (eg.
CURLOPT_AT_TRANSFERS_FILES)
which default to false. Similar to SO_BROADCAST, where binding a socket
to a
broadcast address is not enough to send the packets there.
It *is* a BC break, but the current API is badly provided. I don't see a
way to
work around that. A one-line fix to get the previous not-too-used(?)
behavior back
seems as good as can be achieved.
It is also possible to make a completely new option API without those
problems,
and deprecate the old one, but that's still a BC break.


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Release process nit

2012-03-12 Thread Pierre Joye
hi!

are they in svn? I can't see them in 5.4

Or which release did have them?

On Mon, Mar 12, 2012 at 10:52 PM, Ondřej Surý ond...@sury.org wrote:
 Hi guys,

 could you please remove this cruft:

 dpkg-source: warning: ignoring deletion of file
 ext/standard/var_unserializer.c.orig
 dpkg-source: warning: ignoring deletion of file
 ext/standard/url_scanner_ex.c.orig
 dpkg-source: warning: ignoring deletion of file ext/date/lib/parse_date.c.orig
 dpkg-source: warning: ignoring deletion of file ext/pdo/pdo_sql_parser.c.orig
 dpkg-source: warning: ignoring deletion of directory autom4te.cache
 dpkg-source: warning: ignoring deletion of file autom4te.cache/requests
 dpkg-source: warning: ignoring deletion of file autom4te.cache/output.0
 dpkg-source: warning: ignoring deletion of file autom4te.cache/traces.0

 from next release(s).

 Thanks,
 --
 Ondřej Surý ond...@sury.org

 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php




-- 
Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Release process nit

2012-03-12 Thread Stas Malyshev

Hi!


are they in svn? I can't see them in 5.4


They are not in SVN, but at least for autom4te.cache ones they seem to 
be generated when configure script is generated, and the packing script 
happily picks them up. I have no idea how orig files got there...


--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Release process nit

2012-03-12 Thread Alexey Shein
13 марта 2012 г. 3:00 пользователь Stas Malyshev
smalys...@sugarcrm.com написал:
 Hi!

 are they in svn? I can't see them in 5.4


 They are not in SVN, but at least for autom4te.cache ones they seem to be
 generated when configure script is generated, and the packing script happily
 picks them up. I have no idea how orig files got there...

If Ondřej applied any patches before compiling, patch(1) program could
leave them as backup copies.

 --
 Stanislav Malyshev, Software Architect
 SugarCRM: http://www.sugarcrm.com/
 (408)454-6900 ext. 227


 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php




-- 
Regards,
Shein Alexey

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Release process nit

2012-03-12 Thread Christopher Jones



On 03/12/2012 03:06 PM, Alexey Shein wrote:

13 марта 2012 г. 3:00 пользователь Stas Malyshev
smalys...@sugarcrm.com  написал:

Hi!


are they in svn? I can't see them in 5.4



They are not in SVN, but at least for autom4te.cache ones they seem to be
generated when configure script is generated, and the packing script happily
picks them up. I have no idea how orig files got there...


If Ondřej applied any patches before compiling, patch(1) program could
leave them as backup copies.


The autom4te.cache and *.orig files originally mentioned are included in 
php.net's php-5.4.0.tar.bz2
I.e. this is a valid issue.

Ondřej, please log a bug.

Chris

--
Email: christopher.jo...@oracle.com
Tel:  +1 650 506 8630
Blog:  http://blogs.oracle.com/opal/

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Release process nit

2012-03-12 Thread Ondřej Surý
On Mon, Mar 12, 2012 at 23:06, Alexey Shein con...@gmail.com wrote:
 13 марта 2012 г. 3:00 пользователь Stas Malyshev
 smalys...@sugarcrm.com написал:
 Hi!

 are they in svn? I can't see them in 5.4


 They are not in SVN, but at least for autom4te.cache ones they seem to be
 generated when configure script is generated, and the packing script happily
 picks them up. I have no idea how orig files got there...

 If Ondřej applied any patches before compiling, patch(1) program could
 leave them as backup copies.

I did not. I wouldn't write this email if I wasn't sure the cruft is
in the tarball:

ondrej@kiMac:/tmp$ md5 ~/Downloads/php-5.4.0.tar.gz
MD5 (/Users/ondrej/Downloads/php-5.4.0.tar.gz) =
46b72e274c6ea7e775245ffdb81c9ce5
ondrej@kiMac:/tmp$ tar -tzvf ~/Downloads/php-5.4.0.tar.gz | grep .orig
-rw-r--r--  0 smalyshev staff   30417 29 úno 08:37
php-5.4.0/ext/standard/url_scanner_ex.c.orig
-rw-r--r--  0 smalyshev staff   27289 29 úno 08:37
php-5.4.0/ext/standard/var_unserializer.c.orig
-rw-r--r--  0 smalyshev staff   19670 29 úno 08:37
php-5.4.0/ext/pdo/pdo_sql_parser.c.orig
-rw-r--r--  0 smalyshev staff   518939 29 úno 08:37
php-5.4.0/ext/date/lib/parse_date.c.orig
ondrej@kiMac:/tmp$ tar -tzvf ~/Downloads/php-5.4.0.tar.gz | grep autom4te
drwxr-xr-x  0 smalyshev staff   0 29 úno 08:37 php-5.4.0/autom4te.cache/
-rw-r--r--  0 smalyshev staff  3012815 29 úno 08:37
php-5.4.0/autom4te.cache/output.0
-rw-r--r--  0 smalyshev staff 2855 29 úno 08:37
php-5.4.0/autom4te.cache/requests
-rw-r--r--  0 smalyshev staff   387029 29 úno 08:37
php-5.4.0/autom4te.cache/traces.0

O.
-- 
Ondřej Surý ond...@sury.org

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Providing sandboxed versions of include and require language constructs

2012-03-12 Thread Adam Jon Richardson
On Mon, Mar 12, 2012 at 5:08 PM, Richard Lynch c...@l-i-e.com wrote:

 On Tue, March 6, 2012 3:30 am, Florian Anderiasch wrote:

 Security by blacklist almost always isn't security...

 You're bound to miss one of the functions you should have blacklisted,
  but didn't.


Agreed. The approach I'm developing would be a whitelisting approach.


 Something like Drupal would be crippled by this because major
 extensions used by all rely on access that would probably want to be
 blocked.

 So then they'd have to come up with a blessed list of extension to
 not block, and then...


The idea would be to make it easy to add to the default whitelist per
include.

Nice idea, in the abstract, but I don't think it will work out to be
 very useful in the Real World (tm).


I'm working on documenting the ideas and refining the approach. I think it
will hold significant value, but a few years ago I also thought that WebOS
would become a major player in the mobile market :)

Adam

P.S. - Thankful to see that your recent update on your medical prognosis,
Richard.


Re: [PHP-DEV] Release process nit

2012-03-12 Thread Stas Malyshev

Hi1


The autom4te.cache and *.orig files originally mentioned are included in 
php.net's php-5.4.0.tar.bz2
I.e. this is a valid issue.


Definitely seems to be a bug in the makedist script, since these files 
are not in the SVN but appear when packaging.

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Ángel González
On 12/03/12 22:30, Stas Malyshev wrote:
 Hi!

 If you are a framework developer, and really want to shield against a
 bad php.ini setting, you could ini_set() to your prefered charset at the
 beginning of the request.

 That assuming the request is completely processed by your framework
 and you never call any outside code and any outside code never calls
 you - otherwise your messing with INI setting may very well break that
 code or that code's messing with INI settings may very well break yours.
Sure. That's a setting to be kept the same for the request unless you
like trouble.
If you need to call a library function which uses a different html
charset convention you could do so through a wrapper, which sets and
restores the setting.
Still, that API is likely wrong: a library function written by someone
completely unrelated to the main application shouldn't be echoing
anything through the output. And if it's not generating the html, the
htmlspecialchars is better done from the return at the calling
application (probably after converting the internal charset).
Such interfaces may be well served by switching the setting many times.
I was only advocating the usage of ini_set() once in the request,
for the case of a server with two applications having different needs
(equivalent to configuring it on .user.ini or .htaccess).

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Stas Malyshev

Hi!


Still, that API is likely wrong: a library function written by someone
completely unrelated to the main application shouldn't be echoing
anything through the output. And if it's not generating the html, the
htmlspecialchars is better done from the return at the calling
application (probably after converting the internal charset).


Again, you making a huge amount of assumptions about how ALL the 
applications must work, which means you are wrong in 99.(9)% of cases, 
because there's infinitely many applications which don't work exactly 
like yours does, and we have no idea how they work.


The main point is that having global state (and yet worse, changeable 
global state) significantly influence how basic functions are working is 
dangerous. It's like keeping everything in globals and instead of 
passing parameters between functions just change some globals and expect 
functions to pick it up.



Such interfaces may be well served by switching the setting many times.


That's exactly what I am trying to avoid, and you are just illustrating 
why this proposal is dangerous - because that's exactly what is going to 
happen in the code, instead of passing proper arguments to 
htmlspecialchars people will start changing INI settings left and right, 
and then nobody would know what htmlspecialchars() call actually does 
without tracking all the INI changes along the way.

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Yasuo Ohgaki
2012/3/13 Rasmus Lerdorf ras...@lerdorf.com:
 On 03/12/2012 03:05 AM, Yasuo Ohgaki wrote:
 I thought default_charset became UTF-8, so I was expecting
 following HTTP header.

 content-type  text/html; charset=UTF-8

 However, I got empty charset (missing 'charset=UTF-8').
 So I looked up to source and found the line in SAPI.h

 293   #define SAPI_DEFAULT_CHARSET        

 Empty string should be UTF-8, isn't it?

 No, we can't force an output charset on people since it would end up
 breaking a lot of sites.

Right, so may be for the next major release? 5.5.0?

As the first XSS advisory in 2000 states, explicitly setting char coding will
prevent certain XSS. Recent browsers have much better encoding handing,
but setting encoding explicitly is better for security still.

 PHP 5.3's determine_charset behaves exactly like 5.4's. In 5.3 we have:

    if (charset_hint == NULL)
                return cs_8859_1;

 and in 5.4 we have:

    if (charset_hint == NULL)
                return cs_utf_8;

 So there is no difference in their guessing when there is no hint, the
 only difference is that in 5.4 we choose utf8 and in 5.3 we choose
 8859-1 in that case.

I got this with 5.3
?php
echo htmlentities('日本語UTF-8',ENT_QUOTES);
echo htmlentities('日本語UTF-8',ENT_QUOTES, 'UTF-8');

lt;aelig;�yen;aelig;�not;egrave;ordf;�UTF8
gt;lt;日本語UTF-8gt;

So people migrating from 5.3 to 5.4 should not have problems.
Migration older than 5.3 to 5.4 will be problematic.

I always set all parameters for htmlentities/htmlspecialchars, therefore
I haven't noticed this was changed from 5.3. They may be migrating from
5.2 or older. (RHEL5 uses 5.1)

Since PHP does not have default multibyte module, it may be good for having

input_encoding
internal_encoding
output_encoding

php.ini settings and make multibyte modules use them when they are set.
Or just make mbstring default, alternatively.

Rather big change for released version, but this is simple easy change.

Regards,

--
Yasuo Ohgaki

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] default charset confusion

2012-03-12 Thread Rasmus Lerdorf
On 03/12/2012 05:52 PM, Yasuo Ohgaki wrote:
 I always set all parameters for htmlentities/htmlspecialchars, therefore
 I haven't noticed this was changed from 5.3. They may be migrating from
 5.2 or older. (RHEL5 uses 5.1)

No, like I showed, moving from 5.3 to 5.4 breaks because the new default
UTF-8 encoding validates the input and 8859-1 in 5.3 does not. So for
charsets that are actually safe for the low-ascii chars that are
significant to html htmlspecialchars() now returns false in 5.4 because
their chars fail the UTF8 validity check. For people who explicitly set
all the parameters nothing has changed, of course.

-Rasmus

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php