The link to the previous covnersation (discussion happened in private@
and I suppose we can bring some relevant bits here if needed)
https://lists.apache.org/thread.html/2e1c00999e992e15b08938866bfe7bd3c3d3b3d4d7aa2f8f6eb4600d%40%3Cprivate.beam.apache.org%3E

I remember Robert had some points there, but I am not sure we
found/agreed on a solution that was relevant and did not break current
users and their user experience (like the case of blacklists).

On Mon, Mar 23, 2020 at 9:22 PM Luke Cwik <[email protected]> wrote:
>
> Being able to have something that can encode any object (or at least a large 
> class of objects) is extremely powerful so requiring SerializableCoder<T> to 
> only encode T.class would hurt our users.
>
> I believe someone looked at this kind of problem before and we came to an 
> agreement of usng an explicit approve/deny list on the class names which 
> would address the security concern. I don't remember the thread though and 
> couldn't find the thread after a few minutes of searching.
>
>
>
> On Mon, Mar 23, 2020 at 1:07 PM Kenneth Knowles <[email protected]> wrote:
>>
>> So you think the spec for SerializableCoder<T> (currently doesn't really 
>> have one) should be that it dynamically dispatches what it deserializes? I 
>> had imagined we would treat it more as a statically determined coder, so 
>> because it is invariant in T we would not allow up or down casts (they are 
>> unsafe). But we probably don't actually have the static information to do 
>> that anyhow so you are probably right.
>>
>> I wonder about the threat model here. Is this the event that the runner 
>> (managed service or bespoke cluster) is compromised and is attempting RCE on 
>> the Java SDK harness or runner-specific Java-based worker?
>>
>> Kenn
>>
>> On Mon, Mar 23, 2020 at 8:09 AM Luke Cwik <[email protected]> wrote:
>>>
>>> I don't think this is going to work since SerializableCoder<T> should be 
>>> able to decode T and all objects that implement/extend T. I'm pretty sure 
>>> SerializableCoder<Set/List/...> is common enough while the concrete type is 
>>> HashSet/ArrayList/...
>>> I'm pretty sure there is some way you could come up with some way for 
>>> making this optin though.
>>>
>>> On Mon, Mar 23, 2020 at 12:19 AM Colm O hEigeartaigh <[email protected]> 
>>> wrote:
>>>>
>>>> Thanks Kenn. I submitted a PR here: 
>>>> https://github.com/apache/beam/pull/11191
>>>>
>>>> Colm.
>>>>
>>>> On Thu, Mar 19, 2020 at 8:13 PM Kenneth Knowles <[email protected]> wrote:
>>>>>
>>>>> I think this is fine. The same coder is used for encode and decode, so 
>>>>> the Class object should be the same as well. Inheritance is not part of 
>>>>> the Beam model (thank goodness) so this is a language-specific concern. 
>>>>> As far as the model is concerned, the full URN and the payload of the 
>>>>> coder is its identity and coders with different identities have no 
>>>>> inheritance or compatibility relationship. Pipeline snapshot/update is an 
>>>>> edge case, but changing coder is not supported by any runner I know of, 
>>>>> and probably won't be until we have some rather large new ideas.
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Thu, Mar 19, 2020 at 4:50 AM Colm O hEigeartaigh <[email protected]> 
>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have a question on SerializableCoder. I'm looking at hardening the 
>>>>>> Java Object deserialization that is taking place. We have a "Class<T> 
>>>>>> type" that is used to decode the input stream:
>>>>>>
>>>>>> ObjectInputStream ois = new ObjectInputStream(inStream);
>>>>>> return type.cast(ois.readObject());
>>>>>>
>>>>>> What I would like to do would be something like:
>>>>>>
>>>>>> ObjectInputStream ois = new ObjectInputStream(inStream) {
>>>>>>     @Override
>>>>>>     protected Class<?> resolveClass(ObjectStreamClass desc) throws 
>>>>>> IOException, ClassNotFoundException {
>>>>>>         if (!desc.getName().equals(type.getName())) {
>>>>>>             throw new InvalidClassException("Unauthorized 
>>>>>> deserialization attempt", desc.getName());
>>>>>>         }
>>>>>>         return super.resolveClass(desc);
>>>>>>     }
>>>>>> };
>>>>>> return type.cast(ois.readObject());
>>>>>>
>>>>>> This would prevent a possible security hole where an attacker could try 
>>>>>> to force the recipient of the input stream to deserialize to a gadget 
>>>>>> class or the like for a RCE.
>>>>>>
>>>>>> The question is - does the deserialized type have to correspond exactly 
>>>>>> to the supplied Class? Or is it supported that it's a base type / 
>>>>>> abstract class? If the latter then my idea won't really work. But if the 
>>>>>> type corresponds exactly then it should work OK.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Colm.

Reply via email to