On Sun, Jul 8, 2018 at 8:36 PM, Nicolas Grekas <nicolas.grekas+...@gmail.com > wrote:
> > Before talking about solutions, can the people who need this first outline >>>> what functionality is needed and what it is needed for (and maybe what >>>> workarounds you currently use). E.g. do you only need to know whether >>>> something is a reference, or do you need to know whether two somethings >>>> are >>>> part of the same reference, etc. There are probably multiple use cases >>>> for >>>> this with different needs. >>>> >>> >>> We're using reference introspection to do both: we need to know when a >>> zval is a reference, and we also need to track each of them separately. >>> >>> The use case is being able to intropect any arbitrary PHP datastructure, >>> with one main application: providing an enhanced "dump()" function. >>> >>> See e.g. this screenshot for what we get using the dump() function >>> provided by Symfony VarDumper component: >>> https://symfony.com/doc/current/_images/07-hard-ref.png >>> >>> In PHP5 days, Julien Pauli wrote a PHP extension to do zval >>> introspection. >>> Here is the code + README (see test case 001.phpt for example with >>> references.): >>> https://github.com/symfony/symfony/tree/3.4/src/Symfony/Comp >>> onent/Debug/Resources/ext >>> >>> With PHP7, using pure PHP introspection is easier to maintain and still >>> very fast so we deprecated the extension. >>> Here is the code doing reference introspection: >>> https://github.com/symfony/symfony/blob/master/src/Symfony/C >>> omponent/VarDumper/Cloner/VarCloner.php#L83 >>> >>> it might not be easy to follow, but the basic blocks are: >>> >>> $array2 = $array1; >>> $array2[$key] = $unique_cookie; >>> if ($array1[$key] === $unique_cookie) => we found a reference >>> then we also maintain a registry of $unique_cookie so that we know if we >>> already saw that reference or not (the check is done before the above "if" >>> or course.) >>> >> >> Thanks for the explanation. I think that the VarCloner use case needs two >> bits of functionality: >> >> 1. Detecting whether a variable is a reference, so you can handle this >> specially. >> 2. An efficient way of determining whether a variable is part of a >> reference that has already been seen (and which one). >> >> The second requirement is stronger than just the ability to detect >> whether two variables are part of the same reference. Given just a >> same_ref($v1, $v2) function, one would have to check against a list of all >> previously seen references one at a time, rather than only performing a >> hashtable lookup. >> >> Currently this functionality is implemented as: >> >> 1. Copying the array, assigning a cookie to the copy and seeing if the >> original array is modified. With an extra catch for TypeErrors, this is >> compatible with typed properties. >> 2. Replacing the reference with a Stub object, which can be looked up by >> object id. At the end the Stub objects are replaced with their values >> again. This is fundamentally incompatible with typed properties, as the >> type will likely not permit the Stub class. >> >> Here are my thoughts on possible APIs for this use case. >> >> Construction of reference-reflection objects >> ----- >> >> An issue already discussed in the other threads is that in PHP we need to >> specify whether a parameter is accepted by reference, by value or by >> preferred-reference. We don't have the possibility of accepting either a >> value or reference, whatever we get. This leaves us with a few options: >> >> 1. Introducing a VM-level primitive that is not subject to this >> limitation. The typed properties thread suggested a reflect_variable() >> language construct. I'm not too fond of this option because reference >> reflection seems like an awfully specific thing to introduce a new language >> construct for. >> >> 2. A ReflectionReference::fromVariable(&$var) constructor. Contrary to >> what was said in the other thread, this does not cause issues with the >> copy-on-write mechanism. Since PHP 7 references and non-references can >> share values (including immutablized values in SHM). However, this approach >> does have two issues: >> a) It is impossible to distinguish whether $var was a singleton reference >> or a value beforehand. Both will show up as rc=2 references inside >> ReflectionReference::fromVariable(). (This may also be an advantage, >> because from a language-design perspective, we treat singleton references >> as non-references.) >> b) In case the original $var was a variable, it will now be a reference, >> so this has a side-effect. >> >> 3. A ReflectionReference::fromArrayElem(array $array, string|int $key) >> constructor, as suggested by Nicolas. This avoids the reference/value >> problem and solves the specific VarCloner case efficiently and directly. On >> the other hand, introspection of references inside non-arrays requires some >> workarounds (e.g, casting objects to arrays). >> >> 4. A combination of these. For example we could have... >> ... ReflectionReference::fromArrayElem(array $array, string|int $key) >> for array items. >> ... ReflectionReference::fromObjectProp(object $object, string $key) for >> object properties. >> ... ReflectionReference::fromVariable(&$var) for any other special cases. >> This would allow to cover the common and interesting cases with >> specialized methods, and leave a less efficient fallback for the general >> case. This is probably the option I'd favor. >> > > Either 3. or 4. would be good to me. > IMHO 3. is enough, because : > - ReflectionReference::fromObjectProp() => there need to be a way to deal > with visibility, e.g. accessing private ones. Maybe get the > ReflectionReference from a ReflectionProperty instead? Or just the > array-cast is enought? > Good point, visibility handling is a problem here. In that case I agree that it's best to just go through the array cast here. > - ReflectionReference::fromVariable() => honestly, I don't see any use of > local scope introspection. And if there is one, getting it as an array > first is always possible, so you might prefer less complexity here (i.e not > support this constructor) > Yeah, I don't see the use case either. Especially as this constructor has the likely unwanted side-effect of turning the argument into a reference, it's best not to include it. If it turns out that these constructors are useful after all, it's always possible to add them later. Determining whether something is a reference >> ----- >> >> I think the best way to handle this (and the reason why I used named >> constructors above) is to return null if the value is not a reference. This >> should be the most common case and it would be best to avoid the overhead >> of constructing an unnecessary object in this case. >> >> One important question in this context would be whether we consider >> singleton references as references or not. If we do, then the >> ReflectionReference::fromVariable() constructor will always return a >> non-null value, as the variables will be turned into a singleton reference >> if it was a reference. If we consider them as references, we'll also want >> an API method to distinguish them. E.g. a specialized isSingleton() or more >> generally getNumUsers() == 1. >> >> The alternative would be to always construct a ReflectionReference object >> which may or may not be a reference and has an isReference() method. I >> don't see any advantages to that approach though. >> > > If we're seeking for a benefit, it would be by turning this to a > ReflectionZval insteaf of ReflectionReference. There would be isRef, but > also getType, etc. But honeslty I'm with you on this, only refs make sense. > I'm wondering if there is any legitimate use for something like this. The only thing that comes to mind is comparing arrays by identity. E.g. https://github.com/symfony/symfony/blob/master/src/Symfony/Component/VarDumper/Cloner/VarCloner.php#L153 could check directly check whether an array is the $GLOBALS array. It would also allow direct recursion detection on arrays (ref https://stackoverflow.com/questions/9105816/is-there-a-way-to-detect-circular-arrays-in-pure-php). Doing the same on just references is nearly the same, but doesn't handle the $GLOBALS case correctly. > Reference equality >> ----- >> >> A couple of approaches: >> >> 1. Have an isEqual(ReflectionReference $other): bool method, which >> determined whether two references are the same. The disadvantage is that >> this only allows pair-wise comparisons, so it does not fully solve the >> VarCloner use-case. >> >> 2. Make ReflectionReference constructor uniquing. That is, if a >> ReflectionReference for a certain reference already exists, then the >> constructor will return the same object. This means that references can be >> compared by identity $ref1 === $ref2. It also means that they can be used >> in hashtables via spl_object_id(). (Caveat: It's important to keep the >> ReflectionReference object alive for the during in which spl_object_id() is >> used, as usual.) >> >> 3. Some variation on 2 via a separate API. That is don't unique >> ReflectionReferences themselves, but provide a separate getId() API. The >> returned ID would only be meaningful as long as at least one >> ReflectionReference object for the reference is live, otherwise it may be >> reused. >> > > 2. would work, as I could use that with an SplObjectStorage (which would > satisfy your condition of keeping the object around). > > There is a 4th possibility: have a "ReflectionReference::getHash" method, > that would return a unique string/id per reference (what > symfony_zval_debug() does.) > Is this the same as my 3rd suggestion with a different name, or is there a distinction here in that it removes the requirement for the ReflectionReference object to be live? The reason I included that was to make sure we have a (non-leaking) way of assigning ids to references. Otherwise the only way to do is to use the address of the zend_reference as the hash/id. I don't like exposing memory addresses directly to userland (as this is generally a security issue), but we could return the address hashed with a per-process key. This would make the function somewhat slower though. Nikita