Hello,

Thanks David to propose a new PR [1] for KIP-498 [2] to address KAFKA-4090 [3].

I find it interesting, I wonder how it stands w.r.t. avoiding
inter-layer violation.

[1] https://github.com/apache/kafka/pull/8066
[2] https://issues.apache.org/jira/browse/KAFKA-4090
[3] 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-498%3A+Add+client-side+configuration+for+maximum+response+size+to+protect+against+OOM

Le mar. 6 août 2019 à 15:37, Gokul Ramanan Subramanian
<gokul24...@gmail.com> a écrit :
>
> Hi Alexandre.
>
> Thanks for this analysis.
>
> IMHO, there are 4 ways to ago about this:
>
> 1. We don't fix the bug directly but instead update the Kafka documentation
> telling clients to configure themselves correctly - Silly but easy to
> achieve.
> 2. We adopt Stanislav's solution that fixes the problem - Easy to achieve,
> potentially adds inflexibility in the future if we want to change handshake
> protocol. However, changing the handshake protocol is going to be a
> backwards incompatible change anyway with or without Stanislav's solution.
> 3. We adopt Alexandre's solution - Easy to achieve, but has shortcomings
> Alexandre has highlighted.
> 4. We pivot KIP-498 to focus on standardizing the handshake protocol - Not
> easy to achieve, since this will be a backwards incompatible change and
> will require client and server redeployments in correct order. Further,
> this can be a hard problem to solve given that various transport layer
> protocols have different headers. In order for the "new handshake" protocol
> to work, it would have to work in the same namespace as those headers,
> which will require careful tuning of handshake constants.
>
> Any thoughts from the community on how we can proceed?
>
> Thanks.
>
> On Tue, Aug 6, 2019 at 12:41 PM Alexandre Dupriez <
> alexandre.dupr...@gmail.com> wrote:
>
> > Hello,
> >
> > I wrote a small change [1] to make clients validate the size of messages
> > received from a broker at the protocol-level.
> > However I don't like the change. It does not really solve the problem which
> > originally motivated the KIP - that is, protocol mismatch (plaintext to SSL
> > endpoint). More specifically, few problems I can see are listed below. This
> > is a non-exhaustive list. They also have been added to KIP-498 [2].
> >
> > 1) Incorrect failure mode
> > As was experimented and as can be seen as part of the integration tests,
> > when an invalid size is detected and the exception InvalidReceiveException
> > is thrown, consumers and producers keeps retrying until the poll timeout
> > expires or the maximum number of retries is reached. This is incorrect if
> > we consider the use case of protocol mismatch which motivated this change.
> > Indeed, producers and consumers would need to fail fast as retries will
> > only prolong the time to remediation from users/administrators.
> >
> > 2) Incomplete remediation
> > The proposed change cannot provide an definite guarantee against OOM in all
> > scenarios - for instance, it will still manifest if the maximum size is set
> > to 100 MB and the JVM is under memory pressure and have less than 100 MB of
> > allocatable memory.
> >
> > 3) Illegitimate message rejection
> > Even worse: what if the property is incorrectly configured and prevent
> > legitimate messages from reaching the client?
> >
> > 4) Unclear configuration parameter
> > 4.a) The name max.response.size intends to mirror the existing
> > max.request.size from the producer's configuration properties. However,
> > max.request.size intends to check the size of producer records as provided
> > by a client; while max.response.size is to check the size directly decoded
> > from the network according to Kafka's binary protocol.
> > 4.b) On the broker, the property socket.request.max.bytes is used to
> > validate the size of messages received by the server. The new property
> > serves the same purpose, which introduces duplicated semantic, even if one
> > property is characterised with the keyword "request" and the other with
> > "response", in both cases reflecting the perspective adopted from either a
> > client or a server.
> >
> > Please let me know what you think. An alternative mitigation may be worth
> > investigated for the detection of protocol mismatch in the client.
> >
> > [1] https://github.com/apache/kafka/pull/7160
> > [2]
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-498%3A+Add+client-side+configuration+for+maximum+response+size+to+protect+against+OOM
> >
> > Le jeu. 1 août 2019 à 09:42, Alexandre Dupriez <
> > alexandre.dupr...@gmail.com>
> > a écrit :
> >
> > > Thanks Gokul and Stanislav for your answers.
> > >
> > > I went through the PR 5940 [1]. Indeed Gokul, your reasoning echoes the
> > > observations of Ismael about a potential inter-protocol layering
> > violation.
> > >
> > > As you said Stanislav, the server-side SSL engine responds with an alert
> > > with code 80 (internal_error) from what I saw when reproducing the OOM.
> > > Since the Alert is generated below the application layer, I am not sure
> > > what could be done on the broker to handle the scenario more gracefully.
> > > Interestingly, the SSL engine emits the possibility of receiving a
> > > plaintext message in debug logs [2].
> > >
> > > The idea was indeed to perform a simple check on the message size decoded
> > > in NetworkReceive against a configurable value, and throw
> > > an InvalidReceiveException in a similar fashion as what happens on the
> > > broker, where a non-unlimited maxSize can be provided. Basically, mimic
> > the
> > > behaviour on the broker. The default value for the maximal request size
> > on
> > > the broker is 100 MB which you are suggesting to use client-side.
> > >
> > > Adding a client configuration property for clients may be an overkill
> > > here. What I am going to ask is naive but - is it theoretically possible
> > > for the broker to legitimately send responses over 100 MB in size?
> > >
> > > Thanks,
> > > Alexandre
> > >
> > > [1] https://github.com/apache/kafka/pull/5940
> > > [2]
> > >
> > > kafka-network-thread-0-ListenerName(SSL)-SSL-4, fatal error: 80: problem
> > > unwrapping net record
> > >
> > > javax.net.ssl.SSLException: Unrecognized SSL message, plaintext
> > connection?
> > >
> > > kafka-network-thread-0-ListenerName(SSL)-SSL-4, SEND TLSv1.2 ALERT:
> > fatal,
> > > description = internal_error
> > >
> > > kafka-network-thread-0-ListenerName(SSL)-SSL-4, WRITE: TLSv1.2 Alert,
> > > length = 2
> > >
> > > kafka-network-thread-0-ListenerName(SSL)-SSL-4, called closeOutbound()
> > >
> > > kafka-network-thread-0-ListenerName(SSL)-SSL-4, closeOutboundInternal()
> > >
> > > kafka-network-thread-0-ListenerName(SSL)-SSL-4, called closeInbound()
> > >
> > > kafka-network-thread-0-ListenerName(SSL)-SSL-4, fatal: engine already
> > > closed.  Rethrowing javax.net.ssl.SSLException: Inbound closed before
> > > receiving peer's close_notify: possible truncation attack?
> > >
> > > [Raw write]: length = 7
> > >
> > > 0000: 15 03 03 00 02 02 50                               ......P
> > >
> > >
> > > Le jeu. 1 août 2019 à 08:50, Stanislav Kozlovski <stanis...@confluent.io
> > >
> > > a écrit :
> > >
> > >> Hey Alexandre, thanks for the KIP!
> > >>
> > >> I had personally hit the same problem and found it very annoying.
> > >> Have you considered detecting such a scenario in the client and simply
> > >> throwing an exception, rather than allocating any memory?
> > >> I have an open PR that does just that -
> > >> https://github.com/apache/kafka/pull/5940
> > >> <https://github.com/apache/kafka/pull/5940/files>
> > >>
> > >> Best,
> > >> Stanislav
> > >>
> > >> On Wed, Jul 31, 2019 at 10:35 AM Gokul Ramanan Subramanian <
> > >> gokul24...@gmail.com> wrote:
> > >>
> > >> > Hi Alex.
> > >> >
> > >> > A rejected alternative is to try to do SSL handshake from the
> > plaintext
> > >> > transport layer. This couples various transport layers and is
> > >> inflexible to
> > >> > adding new transport layers in the future. Further, if the plaintext
> > >> > transport layer does SSL handshake, it will always get an error,
> > >> > irrespective of whether or not the peer supports SSL. Therefore, the
> > >> > plaintext transport layer would have to determine if the peer uses SSL
> > >> or
> > >> > not based on the type of error it gets back from the SSL handshake.
> > This
> > >> > fits right into the general anti-pattern of using exceptions as
> > control
> > >> > flow mechanisms.
> > >> >
> > >> > Another rejected alternative would be to have a single byte at the
> > >> > transport layer represent the security protocol in use. This byte
> > would
> > >> > have to be present in every single message between clients and brokers
> > >> and
> > >> > between brokers and brokers. This change is backwards-incompatible and
> > >> adds
> > >> > overhead to Kafka. It is likely never going to get accepted by the
> > >> > community.
> > >> >
> > >> > In the absence of a fool-proof way to do a handhskake across all
> > >> security
> > >> > protocols (plaintext, SSL, other future ones), I think that the
> > proposed
> > >> > KIP provides a good solution. Therefore, you have my +1. (Not sure if
> > >> my +1
> > >> > counts, since I am a Kafka noob).
> > >> >
> > >> > Thanks.
> > >> > Gokul Ramanan Subramanian
> > >> > Senior SDE, Amazon AWS
> > >> >
> > >> > On 28/Jul/19 05:43:19PM +0100, Alexandre Dupriez wrote:
> > >> >  Hello,
> > >> >
> > >> >  I have created the KIP-498
> > >> >  <
> > >> >
> > >>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-498%3A+Add+client-side+configuration+for+maximum+response+size+to+protect+against+OOM
> > >> > >
> > >> >  to
> > >> >  propose a minor addition to the producer configuration properties, in
> > >> > order
> > >> >  to protect against OOM when the response message decoded by a client
> > >> >  indicates an "abnormally high" size to be allocated.
> > >> >
> > >> >  This follows this discussion thread
> > >> >  <https://www.mail-archive.com/dev@kafka.apache.org/msg55658.html>
> > and
> > >> is
> > >> >  tracked by KAFKA-4090 <
> > >> https://issues.apache.org/jira/browse/KAFKA-4090>.
> > >> >
> > >> >  Please let me know what you think. I created this KIP even though the
> > >> >  change seems minimal because it affects client configuration, which
> > is
> > >> >  mentioned as a type of change eligible for a KIP on this main wiki
> > page
> > >> >  <
> > >> >
> > >>
> > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> > >> > >
> > >> >  .
> > >> >
> > >> >  Thanks,
> > >> >  Alexandre
> > >> >
> > >>
> > >
> >
>
>
> --
> GOKUL R. SUBRAMANIAN,
> Software Engineer

Reply via email to