Re: Question on SerializableCoder

Ismaël Mejía Thu, 26 Mar 2020 08:11:32 -0700

Luke, can a similar approach be used in classic runners? What would we
need to change to achieve it?


On Thu, Mar 26, 2020 at 4:06 PM Luke Cwik <[email protected]> wrote:
>
> From the private@ thread, I suggested:
> "With the JvmInitializer[1] being supported by Dataflow and the portable Java 
> container, users would be able to write code which sets the system property 
> jdk.serialFilter or by configuring 
> ObjectInputFilter.Config.setSerialFilter(filter)[2]"
>
> This could become a documentation change to SerializableCoder.
>
> 1: 
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/harness/JvmInitializer.java
> 2: 
> https://docs.oracle.com/javase/10/core/serialization-filtering1.htm#JSCOR-GUID-952E2328-AB66-4412-8B6B-3BCCB3195C25
>
> On Thu, Mar 26, 2020 at 4:51 AM Colm O hEigeartaigh <[email protected]> 
> wrote:
>>
>> Thanks for all the feedback. Two possible ideas that occur to me:
>>
>>  - Create TypeSafeSerializableCoder or somesuch which extends 
>> SerializableCoder and enforces the check as in the PR. Update the 
>> documentation to suggest using the new coder if you don't need to support 
>> collections as the raw type or subclasses.
>>  - Add a new "of" method to SerializableCoder which takes a boolean 
>> parameter to control whether we perform this check or not (defaults to 
>> true?).
>>
>> Colm.
>>
>> On Tue, Mar 24, 2020 at 2:11 AM Luke Cwik <[email protected]> wrote:
>>>
>>> I have seen people ingest data using SerializableCoder.
>>>
>>> On Mon, Mar 23, 2020 at 2:51 PM Kenneth Knowles <[email protected]> wrote:
>>>>
>>>> I won't bring other people's words from private@, but can share mine. I 
>>>> don't believe it exposes anything new.
>>>>
>>>> > If it is SerializableCoder - attacker controls the other end of e.g. 
>>>> > Kafka or Pubsub that is decoding w/ ObjectInputStream - [then we could 
>>>> > have an allowlist or try to automatically construct an allowlist] and 
>>>> > otherwise there is no vulnerability for internal coders.
>>>>
>>>> I have never seen or heard of a user doing dynamic deserialization 
>>>> dispatch on ingestion, but that doesn't mean it doesn't happen. If it is 
>>>> important to someone then they would need a more secure solution than 
>>>> SerializableCoder.
>>>>
>>>> Side note: it would be great to provide an efficient and usable solution 
>>>> for the problem of wanting to dynamically dispatch serde in the middle of 
>>>> a pipeline. It is actually independent from being able to provide coders 
>>>> for a wide variety of types, which we can do a bunch of different (mostly 
>>>> better) ways. (has a better solution been built since the last time I 
>>>> thought about this?)
>>>>
>>>> Kenn
>>>>
>>>> On Mon, Mar 23, 2020 at 1:36 PM Ismaël Mejía <[email protected]> wrote:
>>>>>
>>>>> The link to the previous covnersation (discussion happened in private@
>>>>> and I suppose we can bring some relevant bits here if needed)
>>>>> https://lists.apache.org/thread.html/2e1c00999e992e15b08938866bfe7bd3c3d3b3d4d7aa2f8f6eb4600d%40%3Cprivate.beam.apache.org%3E
>>>>>
>>>>> I remember Robert had some points there, but I am not sure we
>>>>> found/agreed on a solution that was relevant and did not break current
>>>>> users and their user experience (like the case of blacklists).
>>>>>
>>>>> On Mon, Mar 23, 2020 at 9:22 PM Luke Cwik <[email protected]> wrote:
>>>>> >
>>>>> > Being able to have something that can encode any object (or at least a 
>>>>> > large class of objects) is extremely powerful so requiring 
>>>>> > SerializableCoder<T> to only encode T.class would hurt our users.
>>>>> >
>>>>> > I believe someone looked at this kind of problem before and we came to 
>>>>> > an agreement of usng an explicit approve/deny list on the class names 
>>>>> > which would address the security concern. I don't remember the thread 
>>>>> > though and couldn't find the thread after a few minutes of searching.
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Mon, Mar 23, 2020 at 1:07 PM Kenneth Knowles <[email protected]> wrote:
>>>>> >>
>>>>> >> So you think the spec for SerializableCoder<T> (currently doesn't 
>>>>> >> really have one) should be that it dynamically dispatches what it 
>>>>> >> deserializes? I had imagined we would treat it more as a statically 
>>>>> >> determined coder, so because it is invariant in T we would not allow 
>>>>> >> up or down casts (they are unsafe). But we probably don't actually 
>>>>> >> have the static information to do that anyhow so you are probably 
>>>>> >> right.
>>>>> >>
>>>>> >> I wonder about the threat model here. Is this the event that the 
>>>>> >> runner (managed service or bespoke cluster) is compromised and is 
>>>>> >> attempting RCE on the Java SDK harness or runner-specific Java-based 
>>>>> >> worker?
>>>>> >>
>>>>> >> Kenn
>>>>> >>
>>>>> >> On Mon, Mar 23, 2020 at 8:09 AM Luke Cwik <[email protected]> wrote:
>>>>> >>>
>>>>> >>> I don't think this is going to work since SerializableCoder<T> should 
>>>>> >>> be able to decode T and all objects that implement/extend T. I'm 
>>>>> >>> pretty sure SerializableCoder<Set/List/...> is common enough while 
>>>>> >>> the concrete type is HashSet/ArrayList/...
>>>>> >>> I'm pretty sure there is some way you could come up with some way for 
>>>>> >>> making this optin though.
>>>>> >>>
>>>>> >>> On Mon, Mar 23, 2020 at 12:19 AM Colm O hEigeartaigh 
>>>>> >>> <[email protected]> wrote:
>>>>> >>>>
>>>>> >>>> Thanks Kenn. I submitted a PR here: 
>>>>> >>>> https://github.com/apache/beam/pull/11191
>>>>> >>>>
>>>>> >>>> Colm.
>>>>> >>>>
>>>>> >>>> On Thu, Mar 19, 2020 at 8:13 PM Kenneth Knowles <[email protected]> 
>>>>> >>>> wrote:
>>>>> >>>>>
>>>>> >>>>> I think this is fine. The same coder is used for encode and decode, 
>>>>> >>>>> so the Class object should be the same as well. Inheritance is not 
>>>>> >>>>> part of the Beam model (thank goodness) so this is a 
>>>>> >>>>> language-specific concern. As far as the model is concerned, the 
>>>>> >>>>> full URN and the payload of the coder is its identity and coders 
>>>>> >>>>> with different identities have no inheritance or compatibility 
>>>>> >>>>> relationship. Pipeline snapshot/update is an edge case, but 
>>>>> >>>>> changing coder is not supported by any runner I know of, and 
>>>>> >>>>> probably won't be until we have some rather large new ideas.
>>>>> >>>>>
>>>>> >>>>> Kenn
>>>>> >>>>>
>>>>> >>>>> On Thu, Mar 19, 2020 at 4:50 AM Colm O hEigeartaigh 
>>>>> >>>>> <[email protected]> wrote:
>>>>> >>>>>>
>>>>> >>>>>> Hi,
>>>>> >>>>>>
>>>>> >>>>>> I have a question on SerializableCoder. I'm looking at hardening 
>>>>> >>>>>> the Java Object deserialization that is taking place. We have a 
>>>>> >>>>>> "Class<T> type" that is used to decode the input stream:
>>>>> >>>>>>
>>>>> >>>>>> ObjectInputStream ois = new ObjectInputStream(inStream);
>>>>> >>>>>> return type.cast(ois.readObject());
>>>>> >>>>>>
>>>>> >>>>>> What I would like to do would be something like:
>>>>> >>>>>>
>>>>> >>>>>> ObjectInputStream ois = new ObjectInputStream(inStream) {
>>>>> >>>>>>     @Override
>>>>> >>>>>>     protected Class<?> resolveClass(ObjectStreamClass desc) throws 
>>>>> >>>>>> IOException, ClassNotFoundException {
>>>>> >>>>>>         if (!desc.getName().equals(type.getName())) {
>>>>> >>>>>>             throw new InvalidClassException("Unauthorized 
>>>>> >>>>>> deserialization attempt", desc.getName());
>>>>> >>>>>>         }
>>>>> >>>>>>         return super.resolveClass(desc);
>>>>> >>>>>>     }
>>>>> >>>>>> };
>>>>> >>>>>> return type.cast(ois.readObject());
>>>>> >>>>>>
>>>>> >>>>>> This would prevent a possible security hole where an attacker 
>>>>> >>>>>> could try to force the recipient of the input stream to 
>>>>> >>>>>> deserialize to a gadget class or the like for a RCE.
>>>>> >>>>>>
>>>>> >>>>>> The question is - does the deserialized type have to correspond 
>>>>> >>>>>> exactly to the supplied Class? Or is it supported that it's a base 
>>>>> >>>>>> type / abstract class? If the latter then my idea won't really 
>>>>> >>>>>> work. But if the type corresponds exactly then it should work OK.
>>>>> >>>>>>
>>>>> >>>>>> Thanks,
>>>>> >>>>>>
>>>>> >>>>>> Colm.

Re: Question on SerializableCoder

Reply via email to