I have seen people ingest data using SerializableCoder.

On Mon, Mar 23, 2020 at 2:51 PM Kenneth Knowles <[email protected]> wrote:

> I won't bring other people's words from private@, but can share mine. I
> don't believe it exposes anything new.
>
> > If it is SerializableCoder - attacker controls the other end of e.g.
> Kafka or Pubsub that is decoding w/ ObjectInputStream - [then we could have
> an allowlist or try to automatically construct an allowlist] and otherwise
> there is no vulnerability for internal coders.
>
> I have never seen or heard of a user doing dynamic deserialization
> dispatch on ingestion, but that doesn't mean it doesn't happen. If it is
> important to someone then they would need a more secure solution than
> SerializableCoder.
>
> Side note: it would be great to provide an efficient and usable solution
> for the problem of wanting to dynamically dispatch serde in the middle of a
> pipeline. It is actually independent from being able to provide coders for
> a wide variety of types, which we can do a bunch of different (mostly
> better) ways. (has a better solution been built since the last time I
> thought about this?)
>
> Kenn
>
> On Mon, Mar 23, 2020 at 1:36 PM Ismaël Mejía <[email protected]> wrote:
>
>> The link to the previous covnersation (discussion happened in private@
>> and I suppose we can bring some relevant bits here if needed)
>>
>> https://lists.apache.org/thread.html/2e1c00999e992e15b08938866bfe7bd3c3d3b3d4d7aa2f8f6eb4600d%40%3Cprivate.beam.apache.org%3E
>>
>> I remember Robert had some points there, but I am not sure we
>> found/agreed on a solution that was relevant and did not break current
>> users and their user experience (like the case of blacklists).
>>
>> On Mon, Mar 23, 2020 at 9:22 PM Luke Cwik <[email protected]> wrote:
>> >
>> > Being able to have something that can encode any object (or at least a
>> large class of objects) is extremely powerful so requiring
>> SerializableCoder<T> to only encode T.class would hurt our users.
>> >
>> > I believe someone looked at this kind of problem before and we came to
>> an agreement of usng an explicit approve/deny list on the class names which
>> would address the security concern. I don't remember the thread though and
>> couldn't find the thread after a few minutes of searching.
>> >
>> >
>> >
>> > On Mon, Mar 23, 2020 at 1:07 PM Kenneth Knowles <[email protected]>
>> wrote:
>> >>
>> >> So you think the spec for SerializableCoder<T> (currently doesn't
>> really have one) should be that it dynamically dispatches what it
>> deserializes? I had imagined we would treat it more as a statically
>> determined coder, so because it is invariant in T we would not allow up or
>> down casts (they are unsafe). But we probably don't actually have the
>> static information to do that anyhow so you are probably right.
>> >>
>> >> I wonder about the threat model here. Is this the event that the
>> runner (managed service or bespoke cluster) is compromised and is
>> attempting RCE on the Java SDK harness or runner-specific Java-based worker?
>> >>
>> >> Kenn
>> >>
>> >> On Mon, Mar 23, 2020 at 8:09 AM Luke Cwik <[email protected]> wrote:
>> >>>
>> >>> I don't think this is going to work since SerializableCoder<T> should
>> be able to decode T and all objects that implement/extend T. I'm pretty
>> sure SerializableCoder<Set/List/...> is common enough while the concrete
>> type is HashSet/ArrayList/...
>> >>> I'm pretty sure there is some way you could come up with some way for
>> making this optin though.
>> >>>
>> >>> On Mon, Mar 23, 2020 at 12:19 AM Colm O hEigeartaigh <
>> [email protected]> wrote:
>> >>>>
>> >>>> Thanks Kenn. I submitted a PR here:
>> https://github.com/apache/beam/pull/11191
>> >>>>
>> >>>> Colm.
>> >>>>
>> >>>> On Thu, Mar 19, 2020 at 8:13 PM Kenneth Knowles <[email protected]>
>> wrote:
>> >>>>>
>> >>>>> I think this is fine. The same coder is used for encode and decode,
>> so the Class object should be the same as well. Inheritance is not part of
>> the Beam model (thank goodness) so this is a language-specific concern. As
>> far as the model is concerned, the full URN and the payload of the coder is
>> its identity and coders with different identities have no inheritance or
>> compatibility relationship. Pipeline snapshot/update is an edge case, but
>> changing coder is not supported by any runner I know of, and probably won't
>> be until we have some rather large new ideas.
>> >>>>>
>> >>>>> Kenn
>> >>>>>
>> >>>>> On Thu, Mar 19, 2020 at 4:50 AM Colm O hEigeartaigh <
>> [email protected]> wrote:
>> >>>>>>
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> I have a question on SerializableCoder. I'm looking at hardening
>> the Java Object deserialization that is taking place. We have a "Class<T>
>> type" that is used to decode the input stream:
>> >>>>>>
>> >>>>>> ObjectInputStream ois = new ObjectInputStream(inStream);
>> >>>>>> return type.cast(ois.readObject());
>> >>>>>>
>> >>>>>> What I would like to do would be something like:
>> >>>>>>
>> >>>>>> ObjectInputStream ois = new ObjectInputStream(inStream) {
>> >>>>>>     @Override
>> >>>>>>     protected Class<?> resolveClass(ObjectStreamClass desc) throws
>> IOException, ClassNotFoundException {
>> >>>>>>         if (!desc.getName().equals(type.getName())) {
>> >>>>>>             throw new InvalidClassException("Unauthorized
>> deserialization attempt", desc.getName());
>> >>>>>>         }
>> >>>>>>         return super.resolveClass(desc);
>> >>>>>>     }
>> >>>>>> };
>> >>>>>> return type.cast(ois.readObject());
>> >>>>>>
>> >>>>>> This would prevent a possible security hole where an attacker
>> could try to force the recipient of the input stream to deserialize to a
>> gadget class or the like for a RCE.
>> >>>>>>
>> >>>>>> The question is - does the deserialized type have to correspond
>> exactly to the supplied Class? Or is it supported that it's a base type /
>> abstract class? If the latter then my idea won't really work. But if the
>> type corresponds exactly then it should work OK.
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>>
>> >>>>>> Colm.
>>
>

Reply via email to