Re: [PHP-DEV] [RFC] Lazy Objects

Nicolas Grekas Tue, 02 Jul 2024 07:50:51 -0700

Hi Valentin, Marco, Benjamin, Tim, Rob,

Thanks for the detailed feedback again, it's very helpful!
Let me try to answer many emails at once, in chronological order:

The RFC says that Virtual state-proxies are necessary because of circular
> references. It's difficult to accept this reasoning, because using
> circular references is a bad practice and the given example is something I
> try to avoid by all means in my code.
>

While discussing this argument about circular references with Arnaud, we
realized that with this reasoning, we wouldn't have a garbage collector in
the engine. Yet and fortunately, there is one because circular references
are an important thing that exists in practice. We have to account for
circular references, that's not an option.

don't touch `readonly` because of lazy objects: this feature is too niche
> to cripple a major-major feature like `readonly`. I would suggest deferring
> until after the first bits of this RFC landed.
>

Following Marco's advice, we've decided to remove all the flags related to
the various ways to handle readonly. This also removes the secondary vote.
The behavior related to readonly properties is now that they are skipped if
already initialized when calling resetAsLazy* methods, throw in the
initializer as usual, and are resettable only if the class is not final, as
already allowed in userland (and as explained in the RFC).

I finally got around to giving the RFC another read. Please apologize if
> this email asks questions that have already been answered elsewhere, as
> the current mailing list volume makes it hard for me to keep up.
>
> On 6/14/24 14:13, Arnaud Le Blanc wrote:
> >> Is there any reason to call the makeLazyX() methods on an object that
> >> was not just freshly created with ->newInstanceWithoutConstructor()
> >> then?
> >
> > There are not many reasons to do that. The only indented use-case that
> > doesn't involve an object freshly created with
> > ->newInstanceWithoutConstructor() is to let an object manage its own
> > laziness by making itself lazy in its constructor:
> >
>
> Okay. But the RFC (and your email) does not explain why I would want do
> that. It appears that much of the RFC's complexity (e.g. around readonly
> properties and destructors) stems from the wish to support turning an
> existing object into a lazy object. If there is no strong reason to
> support that, I would suggest dropping that. It could always be added in
> a future PHP version.
>

This capability is needed for two reasons: 1. completeness and 2. feature
parity with what can be currently done using magic methods (so that it's
already used to solve real-world problems).

This relates to Benjamin's question about using a static factory instead of
a constructor. This is a valid alternative, but it can be used only when
you are in control of the instantiation logic. That's not always the case.
E.g. Doctrine uses the "new $class" pattern in its configuration system.
Whether this is a good idea or not is not the topic. But this pattern means
that as a user of Doctrine, you sometimes have to provide a class name and
can't use any other constructor. Doctrine is just an example of course.
Another example is when you have a library that wants to make one of its
classes lazy: let's say __ construct() is the way for the users of this lib
to use it (pretty common), then moving to a static factory is not possible
without a BC break.

So yes, turning an existing instance lazy is definitely needed.
About readonly, see the simplification above.

> >>>> - The return value of the initializer has to be an instance of a
> parent
> >>>> or a child class of the lazy-object and it must have the same
> properties.
> >>>>
> >>>> Would returning a parent class not violate the LSP? Consider the
> >>>> following example:
> >>>>
> >>>>        class A { public string $s; }
> >>>>        class B extends A { public function foo() { } }
> >>>>
> >>>>        $o = new B();
> >>>>        ReflectionLazyObject::makeLazyProxy($o, function (B $o) {
> >>>>          return new A();
> >>>>        });
> >>>>
> >>>>        $o->foo(); // works
> >>>>        $o->s = 'init';
> >>>>        $o->foo(); // breaks
> >>>
> >>> $o->foo() calls B::foo() in both cases here, as $o is always the proxy
> >>> object. We need to double check, but we believe that this rule doesn't
> >>> break LSP.
> >>
> >> I don't understand what happens with the 'A' object then, but perhaps
> >> this will become clearer once you add the requested examples.
> >
> > The 'A' object is what is called the "actual instance" in the RFC. $o
> > acts as a proxy to the actual instance: Any property access on $o is
> > forwarded to the actual instance A.
>
> I've read the updated RFC and it's still not clear to me that returning
> an arbitrary “actual instance” object is sound. Especially when private
> properties - which for all intents and purposes are not visible outside
> of the class - are involved. Consider the following:
>
>      class A {
>        public function __construct(
>          public string $property,
>        ) {}
>      }
>
>      class B extends A {
>        public function __construct(
>          string $property,
>          private string $foo,
>        ) { parent::__construct($property); }
>
>        public function getFoo() {
>          return $this->foo;
>        }
>     }
>
>     $r = new ReflectionClass(B::class);
>     $obj = $r->newLazyProxy(function ($obj) {
>       return new A('value');
>     });
>     var_dump($obj->property); // 'value'
>     var_dump($obj->getFoo()); // Implicitly accesses A::${'\0B\0foo'}
> (i.e. the mangled B::$foo property)?
>
> Now you might say that B does not have the same properties as A and
> creating the proxy is not legal, but then the addition of a new private
> property would immediately break the use of the lazy proxy, which
> specifically is something that private properties should not be able to do.
>

True, thanks for raising this point. After brainstorming with Arnaud, we
improved this behavior by:
1. allowing only parent classes, not child classes
2. requiring that all properties from a real instance have a corresponding
one on the proxy OR that the extra properties on the proxy are skipped/set
before initialization.

This means that it's now possible for a child class to add a property,
private or not. There's one requirement: the property must be skipped or
set before initialization.

For the record, with magic methods, we currently have no choice but to
create an inheritance proxy. This means the situation of having Proxy
extend Real like in your example is the norm. While doing so, it's pretty
common to attach some interface so that we can augment Real with extra
capabilities (let's say Proxy implements LazyObjectInterface). Being able
to use class Real as a backing store for Proxy gives us a very smooth
upgrade path (the implementation of the laziness can remain an internal
detail), and it's also sometimes the only way to leverage a factory that
returns Real, not Proxy.

The cloning behavior appears to be unsound to me. Consider the following:
>
>      class A {
>         public function __construct(
>           public string $property,
>         ) {}
>      }
>      class B extends A {
>         public function foo() { }
>      }
>
>      function only_b(B $b) { $b->foo(); }
>
>      $r = new ReflectionClass(B::class);
>      $b = $r->newLazyProxy(function ($obj) {
>        return new A('value');
>      });
>
>      $b->property = 'init_please';
>
>      $notActuallyB = clone $b;
>      only_b($b); // legal
>      only_b($notActuallyB); // illegal
>
> I'm cloning what I believe to be an instance of B, but get back an A.

That is very true. I had a look at the userland implementation and indeed,
we keep the wrapper while cloning the backing instance (it's not that we
have the choice, the engine doesn't give us any other options).
RFC updated.

We also updated the behavior when an uninitialized proxy is cloned: we now
postpone calling $real->__clone to the moment where the proxy clone is
initialized.

On 6/27/24 16:27, Arnaud Le Blanc wrote:
> >>   * flags should be a `list<SomeEnumAroundProxies>` instead. A bitmask
> for
> >> a new API feels unsafe and anachronistic, given the tiny performance
> hit.
> >>
> >
> > Unfortunately this leads to a 30% slowdown in newLazyGhost() when
> switching
> > to an array of enums, in a micro benchmark. I'm not sure how this would
> > impact a real application, but given this is a performance critical
>
> I'm curious, how did the implementation look like?

I'll let Arnaud answer this one.

  Any access to a non-existant (i.e. dynamic) property will trigger
> initialization and this is not preventable using
> 'skipLazyInitialization()' and 'setRawValueWithoutLazyInitialization()'
> because these only work with known properties?
>
> While dynamic properties are deprecated, this should be clearly spelled
> out in the RFC for voters to make an informed decision.

Absolutely. From a behavioral PoV, dynamic vs non-dynamic properties
doesn't matter: both kinds are uninitialized at this stage and the engine
will trigger object handlers in the same way (it will just not trigger the
same object handlers).

  > If the object is already lazy, a ReflectionException is thrown with
> the message “Object is already lazy”.
>
> What happens when calling the method on a *initialized* proxy object?
> i.e. the following:
>
>      class Obj { public function __construct(public string $name) {} }
>      $obj1 = new Obj('obj1');
>      $r->resetAsLazyProxy($obj, ...);
>      $r->initialize($obj);
>      $r->resetAsLazyProxy($obj, ...);
>
> What happens when calling it for the actual object of an initialized
> proxy object?

Once initialized, a lazy object should be indistinguishable from a non-lazy
one.
This means that the second call to resetAsLazyProxy will just do that:
reset the object like it does for any regular object.

> It's probably not possible to prevent this, but will this
> allow for proxy chains? Example:
>
>      class Obj { public function __construct(public string $name) {} }
>      $obj1 = new Obj('obj1');
>      $r->resetAsLazyProxy($obj1, function () use (&$obj2) {
>          $obj2 = new Obj('obj2');
>          return $obj2;
>      });
>      $r->resetAsLazyProxy($obj2, function () {
>          return new Obj('obj3');
>      });
>      var_dump($obj1->name); // what will this print?

This example doesn't work because $obj2 doesn't exist when trying to make
it lazy but you probably mean this instead?

     class Obj { public function __construct(public string $name) {} }
>      $obj1 = new Obj('obj1');
>      $obj2 = new Obj('obj2');
>      $r->resetAsLazyProxy($obj1, function () use ($obj2) {
>          return $obj2;
>      });
>      $r->resetAsLazyProxy($obj2, function () {
>          return new Obj('obj3');
>      });
>      var_dump($obj1->name); // what will this print?

This will print "obj3": each object is separate from the other from a
behavioral perspective, but with such a chain, accessing $obj1 will trigger
its initializer and will then access $obj2->name, which will trigger the
second initializer then access $obj3->name, which contains "obj3".
(I just confirmed with the implementation I have, which is from a previous
API flavor, but the underlying mechanisms are the same).

I just noticed in the RFC that I don't see any mention of what happens when
> running `get_class`, `get_debug_type`, etc., on the proxies, but it does
> mention var_dump.
>

Yes, because there is nothing to say on the topic: turning an instance lazy
doesn't change anything regarding the type-system so that these will return
the same result - the class of the object.

The RFC is in sync with this message, please have a look for clarifications.

Please let me know if any topics remain unanswered.

Nicolas

Re: [PHP-DEV] [RFC] Lazy Objects

Reply via email to