> > The other failing tests involve "unserialize_callback_func" to gracefully
> > detect class-not-found, and they are all plain broken by the RFC.
> >
> > I created this patch/PR to show the changes that would be required on
> > Symfony to work around the BC break:
> > https://github.com/symfony/symfony/pull/47882
> >
> > Note to readers: in this whole discussion, Symfony is just an example of
> > affected code. In the end, Symfony will adapt to the RFC if it passes.
> The
> > point being made is that PHP scripts should not have to be patched to run
> > on newer minor versions of PHP. That's what "keeping BC" means.
>
> Scripts will not need to be changed, unless they are already relying on
> observed behavior instead of the documented behavior.

When I use Symfony's 'PhpSerializer' and perform the following call:
>
> >         $serializer->decode([
> >             'body' => '{"message": "bar"}',
> >         ]);
>
> Then I can expect the input to be gracefully rejected with a
> MessageDecodingFailedException. This behavior is tested in the
> 'testDecodingFailsWithBadFormat' test.
>
> But *under no circumstances* must I perform the following method call:
>
> >         $serializer->decode([
> >             'body' => 'O:8:"DateTime":0:{}',
> >         ]);
>
> Because this payload is invalid and thus must not exist in practice.
>

That's the current behavior, yes. It allows graceful errors when one
misconfigures their workers and sends jsons to a worker that expects
php-ser. Under no circumstances do we need to guard against adversary
payloads. We could handle the situation you describe of course, but that'd
be just dead code in practice. ¯\_(ツ)_/¯



> > Breaking BC in the name of such invalid payloads doesn't make an argument
> > in favor of the RFC.
>
> I demonstrated how to adjust 'PhpSerializer' to improve the behavior for
> what I consider a reasonable use case without breaking any existing
> tests in step3b.diff and step3c.diff. At the same time the fixed version
> will automatically handle the change this RFC proposes. I even offered
> to adjust the RFC implementation to preserve the original Exception
> message and code.
>
> >> Failed asserting that exception message 'Could not decode message using
> >> PHP serialization: O:13:"ReceivedSt0mp":0:{}.' matches '/class
> >> "ReceivedSt0mp" not found/'.
> >>
> >> Okay, now the Exception message changed. Personally I do not consider
> >> this a BC break: I believe Exception messages are meant for human
> >> consumption, not for programs. Otherwise fixing a typo in the message
> >> would be a BC break. If the code wants to learn about the cause, it
> >> should either use the '$code' or different types of Exception should be
> >> thrown to clarify the cause by entering a different catch() block.
> >>
> >
> > Yes, the specific error message should be part of the BC promise. This
> > allows building test suites that can assert the message in a stable way.
>
> I'm not talking about test suites here. I believe makes sense to verify
> the error message to ensure a specific error message is emitted to the
> human observer in the error log.
>

Breaking test suites is the BC issue I wanted to highlight.



> I was talking about code that does something like this, which I consider
> to be inherently unsafe:
>
> try { … }
> catch (SomeException $e) {
>    if ($e->getMessage() === 'Foobar') doSomething();
>    else doSomethingElse();
> }
>
> As a library author I want to be able to provide the best possible
> Exception message to ease debugging for the user. This is not possible
> if I am locked into a bad choice forever.
>
> > This is also why we don't change the output of
> var_dump/print_r/var_export:
> > they're written now, in the same of BC, for the best of stability. (I've
> > barely read any PHP code where exception's code is used in any useful
> BTW -
> > that can't be a solution.)
>
> PDO is an example where the $code is useful, because it exposes the
> standardized SQL error code. Similarly a HTTP client could expose the
> status code within $code.
>
> For other libraries simply using a different Exception class within the
> same hierarchy is often sufficient to differentiate between different
> errors, because programmatically I often don't care *why exactly* an
> operation failed.
>
> > More importantly, the type of thrown exceptions should undoubtedly be
> part
> > of the BC promise.
>
> I agree.
>
> > Wrapping exceptions inside UnserializationFailedException breaks existing
> > catch/instanceof checks (see my PR above.)
> >
>
> unserialize() only promises you that a Throwable might be thrown:
>
>
> https://www.php.net/manual/en/function.unserialize.php#refsect1-function.unserialize-errors


Breaking BC at will unless it's properly documented looks like a terrible
policy. It'd make developing in PHP really risky. Especially here:
expecting that an exception would bubble down is just what everybody would
expect from unserialize(), like everything else in the engine. It always
did, so ppl built on that. Nothing fancy here.



> Symfony is relying on undocumented behavior.
> >
> >> All green! We **did not have to change anything** after fixing the
> >> implementation in step3a.diff for PHP 8.2 and earlier!
> >>
> >
> > We did change the exception hierarchy, both in terms of types and in
> terms
> > of stack of "previous". In this specific case, we can discuss the merits
> of
>
> If not even the $previous value may change, then we are deep in
> https://xkcd.com/1172/ territory. The whole purpose of $previous is
> wrapping exceptions that come up as an implementation detail and then
> exposing a well-defined Exception at the border of your library /
> component and at the same time exposing the underlying cause to the
> developer to simplify debugging.
>
> > wrapping all throwables or not. But in doubt the other failing cases I
> > reported clearly highlight the BC break.
> >
>
> As I've mentioned at the beginning of this email, the other locations
> are also broken with the current PHP versions.
>
> --------------
>
> Let me work through ResourceCheckerConfigCache as another example. As
> with the 'PhpSerializer' example, I'm working on the assumption that the
> input to 'unserialize' might be broken. If it would be impossible for it
> to be broken, then we do not need error handling in the first place and
> the RFC would not change anything, because it only concerns itself with
> error handling.
>
> So let's add a new test case to 'ResourceCheckerConfigCacheTest'.
> Specifically I am implementing a new resource that implements
> 'SelfCheckingResourceInterface':
>
> > final class MyResource implements
> \Symfony\Component\Config\Resource\SelfCheckingResourceInterface
> > {
> >     public function __construct(
> >         private \DateTimeImmutable $dti
> >     ) {}
> >
> >     public function isFresh(int $timestamp): bool { return false; }
> >
> >     public function __toString(): string { return ''; }
> > }
>
> This resource stores a DTI internally, but the specific object does not
> really matter. Anything that implements __unserialize() would work.
>
> Now let me add a test that uses this resource. Unfortunately my meta
> data file got corrupted on disk, because some unrelated process stomped
> over it and replaced all '-' by an 'x'.
>
> >     public function testCacheIsNotFreshWhenUnserializeFails2()
> >     {
> >         $checker = $this->createMock(ResourceCheckerInterface::class);
> >         $cache = new ResourceCheckerConfigCache($this->cacheFile,
> [$checker]);
> >         $cache->write('foo', [new MyResource(new
> \DateTimeImmutable('now'))]);
> >
> >         $metaFile = "{$this->cacheFile}.meta";
> >         file_put_contents($metaFile, str_replace('-', 'x',
> file_get_contents($metaFile)));
> >
> >         $this->assertFalse($cache->isFresh());
> >     }
>
> Similarly to the test case without the '2' suffix, I expect the cache to
> be ignored, because it can simply be recreated. Let's test this:
>
> > 1)
> Symfony\Component\Config\Tests\ResourceCheckerConfigCacheTest::testCacheIsNotFreshWhenUnserializeFails2
> > Error: Invalid serialization data for DateTimeImmutable object
> >
> > /app/src/Symfony/Component/Config/ResourceCheckerConfigCache.php:159
> > /app/src/Symfony/Component/Config/ResourceCheckerConfigCache.php:77
> >
> /app/src/Symfony/Component/Config/Tests/ResourceCheckerConfigCacheTest.php:152
>
> Oh no. The test failed. To fix this we just need remove everything from
> the body of the catch in
> ResourceCheckerConfigCache::safelyUnserialize(). No matter why the
> unserialization fails, the cache will be ignored if it contains invalid
> data. It goes without saying that this will correctly handle the
> UnserializationFailedException that the RFC proposes.
>
> Full patch attached.
>
> --------------
>
> For DefaultMarshaller I consider the test to be wrong / to be overly
> specific. The 'MarshallerInterface' defines that:
>
> >  * @throws \Exception Whenever unserialization fails
>
> UnserializationFailedException is a subclass of Exception, thus the
> external contract is not violated.
>
> --------------
>
> For ContextListenerTest we can just add 'O:8:"DateTime":0:{}' to
> 'provideInvalidToken' and the same reasoning as with PhpSerializer and
> ResourceCheckerConfigCache applies. I will not work through the steps
> again to keep this email succinct.
>
> --------------
>
> On VarExporterTest I can't make a clear statement, because frankly I
> don't understand what the code is supposed to do.
>
> However:
>
> a) 'Registry' constructs payloads by hand, something that you admitted
> yourself does not exist in practice: "constructed payloads like the ones
> involving DateTime do not exist in practice".
>

It's quite common practice to create objects without constructors by using
constructed payloads. That's what the code you read does. While
ReflectionClass::newInstanceWithoutConstructor() works for userland
classes, many internal classes do require this trick.


b) Registry is marked @internal, as such it does not affect any
> downstream users of Symfony.
>
> c) What Registry does is definitely nothing your average application
> does, as such the caveat "this is not just about Symfony" does not apply.
>

All the above cases are variations of the same topic:

When unserializing a payload, said payload might reference a class that was
removed since the payload was created.
For ResourceChecker, this happens all the time during dev, when one
refactors their code. For Cache, VarExporter and Security, this happens for
similar reasons - renamed classes, etc, but also in prod.

By default, PHP creates weird __PHP_Incomplete_Class objects in places
where not found classes are referenced. This creates invalid object graphs
that we want to reject early on (everybody should most of the time I
guess.) The only way to do so reliably is to use the mechanism I mentioned
in the discussion thread: ini_set(unserialize_callback_func), documented
here:
https://www.php.net/manual/en/var.configuration.php#ini.unserialize-callback-func

and then throw an exception from this callback.
Here is the typical code:

function throwingCallback($class) {
    throw new MyClassNotFoundException($class);
}
$previousCallback = ini_set('unserialize_callback_func', 'throwingCallback'
);
try {
    $data = unserialize($payload);
} catch (MyClassNotFoundException $e) {
    // deal with $e specifically
} catch (Throwable $e) {
    // deal with other type of throwables differently
}

The RFC forces rewriting such logics to accomodate for the new wrapper.
That's the BC break that is highlighted by the PR I linked.

Nicolas

Reply via email to