So you think the spec for SerializableCoder<T> (currently doesn't really
have one) should be that it dynamically dispatches what it deserializes? I
had imagined we would treat it more as a statically determined coder, so
because it is invariant in T we would not allow up or down casts (they are
unsafe). But we probably don't actually have the static information to do
that anyhow so you are probably right.

I wonder about the threat model here. Is this the event that the runner
(managed service or bespoke cluster) is compromised and is attempting RCE
on the Java SDK harness or runner-specific Java-based worker?

Kenn

On Mon, Mar 23, 2020 at 8:09 AM Luke Cwik <[email protected]> wrote:

> I don't think this is going to work since SerializableCoder<T> should be
> able to decode T and all objects that implement/extend T. I'm pretty sure
> SerializableCoder<Set/List/...> is common enough while the concrete type is
> HashSet/ArrayList/...
> I'm pretty sure there is some way you could come up with some way for
> making this optin though.
>
> On Mon, Mar 23, 2020 at 12:19 AM Colm O hEigeartaigh <[email protected]>
> wrote:
>
>> Thanks Kenn. I submitted a PR here:
>> https://github.com/apache/beam/pull/11191
>>
>> Colm.
>>
>> On Thu, Mar 19, 2020 at 8:13 PM Kenneth Knowles <[email protected]> wrote:
>>
>>> I think this is fine. The same coder is used for encode and decode, so
>>> the Class object should be the same as well. Inheritance is not part of the
>>> Beam model (thank goodness) so this is a language-specific concern. As far
>>> as the model is concerned, the full URN and the payload of the coder is its
>>> identity and coders with different identities have no inheritance or
>>> compatibility relationship. Pipeline snapshot/update is an edge case, but
>>> changing coder is not supported by any runner I know of, and probably won't
>>> be until we have some rather large new ideas.
>>>
>>> Kenn
>>>
>>> On Thu, Mar 19, 2020 at 4:50 AM Colm O hEigeartaigh <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a question on SerializableCoder. I'm looking at hardening the
>>>> Java Object deserialization that is taking place. We have a "Class<T> type"
>>>> that is used to decode the input stream:
>>>>
>>>> ObjectInputStream ois = new ObjectInputStream(inStream);
>>>> return type.cast(ois.readObject());
>>>>
>>>> What I would like to do would be something like:
>>>>
>>>> ObjectInputStream ois = new ObjectInputStream(inStream) {
>>>>     @Override
>>>>     protected Class<?> resolveClass(ObjectStreamClass desc) throws
>>>> IOException, ClassNotFoundException {
>>>>         if (!desc.getName().equals(type.getName())) {
>>>>             throw new InvalidClassException("Unauthorized
>>>> deserialization attempt", desc.getName());
>>>>         }
>>>>         return super.resolveClass(desc);
>>>>     }
>>>> };
>>>> return type.cast(ois.readObject());
>>>>
>>>> This would prevent a possible security hole where an attacker could try
>>>> to force the recipient of the input stream to deserialize to a gadget class
>>>> or the like for a RCE.
>>>>
>>>> The question is - does the deserialized type have to correspond exactly
>>>> to the supplied Class? Or is it supported that it's a base type / abstract
>>>> class? If the latter then my idea won't really work. But if the type
>>>> corresponds exactly then it should work OK.
>>>>
>>>> Thanks,
>>>>
>>>> Colm.
>>>>
>>>

Reply via email to