On Fri, May 31, 2013 at 9:11 PM, Anthony Ferrara <ircmax...@gmail.com> wrote:
> Hello all,
>
> I want to start an idea thread (or at least get a conversation going) about
> cleaning up the core integer data type and string lengths. Here's my ideas:
>
> 1. Change string length in the ZVAL from int to size_t
>  - http://lxr.php.net/xref/PHP_5_5/Zend/zend.h#321

Huge +1, as well as for any (allocated) random buffer we use/allocate.

> 2. Change long in the ZVAL  (lval) to a system-determined 64bit fixed size
>
> There are two reasons for this. First, on VS compiles (windows), the
> current long size is always 32 bit. So that means even 64 bit compiles may
> or may not have 64 bit ints.

To do it as transparently as possible and a one time change (but we
can't avoid #ifdef) is to add a php_int type, or, my prefered
solution, we go with int64_t for the zval int type. One open question
is whether we keep the architecture dependent integer size, which is
rather annoying.

> The second reason is that right now PHP can't really handle strings >= 2^31
> characters even on 64 bit compiles. The problem gets pretty comical:
>
> $ php -d memory_limit=499g -r "\$string = str_repeat('x',pow(2, 32)) .
> str_repeat('x', pow(2,4)); var_dump(strlen(\$string));"
> int(16)
>
> Obviously there's a pretty significant ABI break here. I propose a "tweak"
> of the Z_* macros to "fix" that. Basically, Z_STRLEN() will cast the result
> to an int. This is the same behavior as today, and will mean that existing
> extensions continue to function exactly as today. But new extensions (and
> elsewhere in core) can use a new macro Z_STRSIZE() which will return the
> native size_t.

A new macro will be a good solution, but I would name it what it
actually is, Z_SIZE_T.

> Likewise we can do the same for the long data type (Z_LVAL() returns a
> long, and Z_PHPLVAL() returns a php_long (which is a typedef of a 64 bit
> compiler specific type).

I'm not a fan of adding a php_long type but move to the int*_t types.
or php_int*_t types for easy understanding of what is actually used.


> It'll also require 2 new zend_parse_parameters types (one for php_long and
> one for the string len using size_t instead)...

> Additionally, I'd propose a set of central helpers to cast back and forth
> between php_long and long, as well as int to size_t (with overflow checks,
> allowing us to do errors on detected overflows instead of silently ignoring
> them as today).

Same as before, stop using long which has been proven to be not really
portable and can be confusing.

> It would be a *gigantic* patch, but the userland effects should be minimal
> (the only changes would be supporting longer strings, and consistent 64 bit
> int support). The performance considerations should be minimal for
> non-legacy code (as both would still be using native data types)...
>
> What do you think? What am I missing from this? Or is this just a horrific
> idea (given the current implementation details)...?

It is a very good idea and we have been discussed it many times, since
too long. I'm not sure it can be done in 5.x tho'. But no matter when
it will be done, we can already begin to do it in a fork and write
down a RFC. I'll be very happy to help here, on my todos for full
win64 support. Also we will need to patch libraries as well to avoid
the same issues to happen there. A first discussion I had with many of
the developers working on these libraries show that they have (almost)
no issue to clean up this as well.

Cheers,
--
Pierre

@pierrejoye |  http://www.libgd.org

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to