On 04.03.19 10:08, Nikita Popov wrote:
On Mon, Mar 4, 2019 at 6:31 AM Marc <dev@mabe.berlin> wrote:

(sorry for top posting)

Hi Nikita,

as for the question about magic methods vs. interface.

I have the feeling that it could be done with both in a clean way to
make both parties happy.


1. Having __serialize/ __unserialize as magic methods - allowing to
throw exceptions also to explicitly disallow serialization.

2. Having some kind of Serializable interface (I know this is already
used) which expects both methods and also disallows throwing exceptions
to force mark an object serializable. It's not possible to define such
thing in the signature, but it can be documented as well as exceptions
here could be catched internally and re-thrown as UnexpectedExceptionError.

Any thoughts about this?



Another question about static unserialize:

  > Some people have expressed a desire to make |__unserialize()| a
static method which creates and returns the unserialized object (rather
than first constructing the object and then calling |__unserialize()| to
initialize it).

  > This would allow an even greater degree of control over the
serialization mechanism, for example it would allow to return an already
existing object from |__unserialize()|.

  > However, allowing this would once again require immediately calling
|__unserialize()| functions (interleaved with unserialization) to make
the object available for backreferences, which would reintroduce some of
the problems that |Serializable| suffers from. As such, this will not be
supported.

I would be very happy to also have a solution for this as I have a
library which would benefit from it:

(
https://github.com/marc-mabe/php-enum/blob/v3.1.1/src/EnumSerializableTrait.php#L46-L72
)

Would it be possible to instead get the new instance as argument and
expect the instance to be returned but on the same time allow to return
a self instantiated instance?

Something like this:

public  static function  __unserialize(static $new, array  <
http://www.php.net/array>  $data):  static  {
      $new->prop_a = $data['prop_a'];
      return $new;
}

// or ignoring the argument and instantiate a new one

public  static function  __unserialize(static $new, array  <
http://www.php.net/array>  $data):  static  {
      $new = (ReflectionClass(__CLASS__))->newInstanceWithoutConstructor();
      $new->prop_a = $data['prop_a'];
      return $new;
}

The second variant sounds like a wast of resources but if it makes an
edge-case possible it still better then just ignore it.

I would love to have __unserialize() return a new object, I just don't
think it's technically possible. Let me explain in more detail what the
problem here is:

Serialization has a notion of object identity implemented through use of
backreferences. In serialize([$obj, $obj]) the first use of the object is
serialized as usual, while the second is a backreference to the first. To
resolve that backreference, we can either

a) construct objects immediately during unserialization, so we can just
directly use an existing object when we encounter a backreference, or
b) try to keep track of all the places where backreferences are used, so we
can fill them out later.

If __unserialize() is itself responsible for constructing the object, then
a) means that __unserialize() needs to be called in the middle of
unserialization. This is problematic, because the backreference mechanism
uses direct pointers into structures, which can easily become invalid if
the structures are modified inside the __unserialize() implementation. This
is why __wakeup() calls are delayed until the end of serialization. Variant
b) also runs into the same issue. Keeping pointers to all the places that
use backreferences works fine until the first __unserialize() call, which
might modify structures and invalidate those pointers.

Beyond these issues related to implementation details, you have the more
general issue of circular references. If you have an object that has a
reference to itself, how do you unserialize that? You need the already
instantiated object to pass it to __unserialize(), but only __unserialize()
is going to give you the object! You could do something like pass the data
with the circular reference nulled out and then patch it after
__unserialize() runs, but as you don't really know what happens to the data
passed to __unserialize(), you wouldn't know what exactly needs to be
changed.

As such, I think the only way we could have __unserialize() return a new
object is if there was also a way to opt-out from the preservation of
object identity.

Regards,
Nikita


Thank you for the very detailed explanation!

Do I understand it correctly that it's about self references in the serialized object or is that self-reference an internal implementation detail of the serializing logic.


In case about self references in the serialized object and without any knowledge about the internal implementation I can just assume and based in that I still thing it should be possible.

-> see https://3v4l.org/AbQ2s

If my assumptions are just stupid please ignore


Thanks,

Marc


On 26.02.19 15:17, Nikita Popov wrote:
On Tue, Feb 19, 2019 at 11:42 AM Nicolas Grekas <
nicolas.grekas+...@gmail.com> wrote:

Le mar. 19 févr. 2019 à 11:31, Nikita Popov <nikita....@gmail.com> a
écrit :

On Mon, Feb 18, 2019 at 4:12 PM Nicolas Grekas <
nicolas.grekas+...@gmail.com> wrote:

Hi Nikita,

I'd like to propose a new custom object serialization mechanism
intended
to replace the broken Serializable interface:

https://wiki.php.net/rfc/custom_object_serialization

This was already previously discussed in
https://externals.io/message/98834, this just brings it into RFC
form.
The latest motivation for this is
https://bugs.php.net/bug.php?id=77302,
a compatibility issue in 7.3 affecting Symfony, caused by
Serializable. We
can't fix Serializable, but we can at least make sure that a working
alternative exists.

Is there anything we can to do to help the RFC move forward? I'm
spending a great deal of time removing Serializable from the Symfony
code
base. It's not pretty nor fun... but we have no choice since as ppl
move to
PHP 7.3, they can now spot when the current mechanism is breaking
their
serialized payloads (typically from user objects put in the session
storage).

About interface vs magic methods, technicaly, we can work with an
interface provided by PHP core, so that if that's a blocker for
voters to
approve the RFC, let's do it - the current situation is really aweful
:).
FYI, I found myself implementing some getState()/setState() methods
of my
own, decoupled from the underlying mechanisms used by PHP. That
allows me
to still use Serializable for BC reasons when needed, or  move to
__sleep
when possible, then move to the new 7.4 style when ready, without
requiring
any changes from the consumer POV. That's a good illustration of what
I
meant in my previous email: domain-specific interfaces in everyone's
code
is a better design as it allows better decoupling. It's also a better
design to express that "instances of this interface of mine MUST be
serializable". IMHO.

Nicolas

I think for me the main open question here is whether we want to just
handle the serialization issue or try to come up with something more
general. If we limit this to just serialization, then the design should
stay as-is -- for all the reasons you have already outlined, using an
interface for this is inappropriate.

For a more general mechanism, I think we would need something along the
lines of (ignoring naming):

interface GetState {
      public function getState(string $purpose): array;
}
interface SetState {
      public function setState(array $data, string $purpose): void;
}

We need separate interfaces for get/set to properly cater to cases like
JSON, where only get is meaningful (right now). We need the $purpose
argument to allow different state representations for different
purposes,
e.g. JSON will often need more preparation than PHP serialization. The
$purpose argument is a string, to allow extension for custom purposes.

Seeing that, this is really not something I would feel comfortable with
introducing in core PHP. Our track record for introducing reusable
general-purpose interfaces is not exactly stellar (say, did you hear
about
SplObserver and SplSubject?) and I wouldn't want to do this without
seeing
the concept fleshed out extensively in userland first.

Nikita

I think the per-purpose representation logic doesn't belong to each
classes.
Symfony Serializer does it completly from the outside, and I think
that's
a much better design.
I'm on the side this should not be in PHP code indeed.

As there hasn't been further input here, I'd like to go forward with the
RFC as-is and plan to start voting by Friday.

Nikita


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to