Thanks for the update Gabor. I'll take a look and respond in the document.

Cheers,
Till

On Wed, Jun 9, 2021 at 12:59 PM Gabor Somogyi <gabor.g.somo...@gmail.com>
wrote:

> Hi Till,
>
> Your proxy suggestion has been considered in-depth and updated the FLIP
> accordingly.
> We've considered 2 proxy implementation (Nginx and Squid) but according to
> our analysis and testing it's not suitable for the mentioned use-cases.
> Please take a look at the rejected alternatives for detailed explanation.
>
> Thanks for your time in advance!
>
> BR,
> G
>
>
> On Fri, Jun 4, 2021 at 3:31 PM Till Rohrmann <trohrm...@apache.org> wrote:
>
>> As I've said I am not a security expert and that's why I have to ask for
>> clarification, Gabor. You are saying that if we configure a truststore for
>> the REST endpoint with a single trusted certificate which has been
>> generated by the operator of the Flink cluster, then the attacker can
>> generate a new certificate, sign it and then talk to the Flink cluster if
>> he has access to the node on which the REST endpoint runs? My understanding
>> was that you need the corresponding private key which in my proposed setup
>> would be under the control of the operator as well (e.g. stored in a
>> keystore on the same machine but guarded by some secret). That way (if I am
>> not mistaken), only the entity which has access to the keystore is able to
>> talk to the Flink cluster.
>>
>> Maybe we are also getting our wires crossed here and are talking about
>> different things.
>>
>> Thanks for listing the pros and cons of Kerberos. Concerning what other
>> authentication mechanisms are used in the industry, I am not 100% sure.
>>
>> Cheers,
>> Till
>>
>> On Fri, Jun 4, 2021 at 11:09 AM Gabor Somogyi <gabor.g.somo...@gmail.com>
>> wrote:
>>
>>> > I did not mean for the user to sign its own certificates but for the
>>> operator of the cluster. Once the user request hits the proxy, it should no
>>> longer be under his control. I think I do not fully understand yet why this
>>> would not work.
>>> I said it's not solving the authentication problem over any proxy. Even
>>> if the operator is signing the certificate one can have access to an
>>> internal node.
>>> Such case anybody can craft certificates which is accepted by the
>>> server. When it's accepted a bad guy can cancel jobs causing huge impacts.
>>>
>>> > Also, I am missing a bit the comparison of Kerberos to other
>>> authentication mechanisms and why they were rejected in favour of Kerberos.
>>> PROS:
>>> * Since it's not depending on cloud provider and/or k8s or bare-metal
>>> etc. deployment it's the biggest plus
>>> * Centralized with tools and no need to write tons of tools around
>>> * There are clients/tools on almost all OS-es and several languages
>>> * Super huge users are using it for years in production w/o huge issues
>>> * Provides cross-realm trust possibility amongst other features
>>> * Several open source components using it which could increase
>>> compatibility
>>>
>>> CONS:
>>> * Not everybody using kerberos
>>> * It would increase the code footprint but this is true for many
>>> features (as a side note I'm here to maintain it)
>>>
>>> Feel free to add your points because it only represents a single
>>> viewpoint.
>>> Also if you have any better option for strong authentication please
>>> share it and we can consider the pros/cons here.
>>>
>>> BR,
>>> G
>>>
>>>
>>> On Fri, Jun 4, 2021 at 10:32 AM Till Rohrmann <trohrm...@apache.org>
>>> wrote:
>>>
>>>> I did not mean for the user to sign its own certificates but for the
>>>> operator of the cluster. Once the user request hits the proxy, it should no
>>>> longer be under his control. I think I do not fully understand yet why this
>>>> would not work.
>>>>
>>>> What I would like to avoid is to add more complexity into Flink if
>>>> there is an easy solution which fulfills the requirements. That's why I
>>>> would like to exercise thoroughly through the different alternatives. Also,
>>>> I am missing a bit the comparison of Kerberos to other authentication
>>>> mechanisms and why they were rejected in favour of Kerberos.
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Fri, Jun 4, 2021 at 10:26 AM Gyula Fóra <gyf...@apache.org> wrote:
>>>>
>>>>> Hi!
>>>>>
>>>>> I think there might be possible alternatives but it seems Kerberos on
>>>>> the rest endpoint ticks all the right boxes and provides a super clean and
>>>>> simple solution for strong authentication.
>>>>>
>>>>> I wouldn’t even consider sidecar proxies etc if we can solve it in
>>>>> such a simple way as proposed by G.
>>>>>
>>>>> Cheers
>>>>> Gyula
>>>>>
>>>>> On Fri, 4 Jun 2021 at 10:03, Till Rohrmann <trohrm...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> I am not saying that we shouldn't add a strong authentication
>>>>>> mechanism if there are good reasons for it. I primarily would like to
>>>>>> understand the context a bit better in order to give qualified feedback 
>>>>>> and
>>>>>> come to a good decision. In order to do this, I have the feeling that we
>>>>>> haven't fully considered all available options which are on the table, 
>>>>>> tbh.
>>>>>>
>>>>>> Does the problem of certificate expiry also apply for self-signed
>>>>>> certificates? If yes, then this should then also be a problem for the
>>>>>> internal encryption of Flink's communication. If not, then one could use
>>>>>> self-signed certificates with a longer validity to solve the mentioned
>>>>>> issue.
>>>>>>
>>>>>> I think you can set up Flink in such a way that you don't have to
>>>>>> handle all the different certificates. For example, you could deploy 
>>>>>> Flink
>>>>>> with a "sidecar proxy" which is responsible for the authentication using 
>>>>>> an
>>>>>> arbitrary method (e.g. Kerberos) and then bind the REST endpoint to a 
>>>>>> local
>>>>>> network interface. That way, the REST endpoint would only be available
>>>>>> through the sidecar proxy. Additionally, one could enable SSL for this
>>>>>> communication. Would this be a solution for the problem?
>>>>>>
>>>>>> Cheers,
>>>>>> Till
>>>>>>
>>>>>> On Thu, Jun 3, 2021 at 10:46 PM Márton Balassi <
>>>>>> balassi.mar...@gmail.com> wrote:
>>>>>>
>>>>>>> That is an interesting idea, Till.
>>>>>>>
>>>>>>> The main issue with it is that TLS certificates have an expiration
>>>>>>> time, usually they get approved for a couple years. Forcing our users to
>>>>>>> restart jobs to reprovision TLS certificates would be weird when we 
>>>>>>> could
>>>>>>> just implement a single proper strong authentication mechanism instead 
>>>>>>> in a
>>>>>>> couple hundred lines of code. :-)
>>>>>>>
>>>>>>> In many cases it is also impractical to go the TLS mutual route,
>>>>>>> because the Flink Dashboard can end up on any node in the k8s/Yarn 
>>>>>>> cluster
>>>>>>> which means that we need a certificate per node (due to the mutual 
>>>>>>> auth),
>>>>>>> but if we also want to protect the private key of these from users
>>>>>>> accidentally or intentionally leaking them then we need this per user. 
>>>>>>> As
>>>>>>> in we end up managing user*machine number certificates and having to 
>>>>>>> renew
>>>>>>> them periodically, which albeit automatable is unfortunately not yet
>>>>>>> automated in all large organizations.
>>>>>>>
>>>>>>> I fully agree that TLS certificate mutual authentication has its
>>>>>>> nice properties, especially at very large (multiple thousand node) 
>>>>>>> clusters
>>>>>>> - but it has its own challenges too. Thanks for bringing it up.
>>>>>>>
>>>>>>> Happy to have this added to the rejected alternative list so that we
>>>>>>> have the full picture documented.
>>>>>>>
>>>>>>> On Thu, Jun 3, 2021 at 5:52 PM Till Rohrmann <trohrm...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I guess the idea would then be to let the proxy do the
>>>>>>>> authentication job and only forward the request via an SSL mutually
>>>>>>>> encrypted connection to the Flink cluster. Would this be possible? The
>>>>>>>> beauty of this setup is in my opinion that this setup should work with 
>>>>>>>> all
>>>>>>>> kinds of authentication mechanisms.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Till
>>>>>>>>
>>>>>>>> On Thu, Jun 3, 2021 at 3:12 PM Gabor Somogyi <
>>>>>>>> gabor.g.somo...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thanks for giving options to fulfil the need.
>>>>>>>>>
>>>>>>>>> Users are looking for a solution where users can be identified on
>>>>>>>>> the whole cluster and restrict access to resources/actions.
>>>>>>>>> A good example for such an action is cancelling other users
>>>>>>>>> running jobs.
>>>>>>>>>
>>>>>>>>> * SSL does provide mutual authentication but when authentication
>>>>>>>>> passed there is no user based on restrictions can be made.
>>>>>>>>> * The less problematic part is that generating/maintaining short
>>>>>>>>> time valid certificates would be a hard (that's the reason KDC like 
>>>>>>>>> servers
>>>>>>>>> exist).
>>>>>>>>> Having long time valid certificates would widen the attack surface
>>>>>>>>> but since the first concern is there this is just a cosmetic issue.
>>>>>>>>>
>>>>>>>>> All in all using TLS certificates is not sufficient in these
>>>>>>>>> environments unfortunately.
>>>>>>>>>
>>>>>>>>> BR,
>>>>>>>>> G
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jun 3, 2021 at 12:49 PM Till Rohrmann <
>>>>>>>>> trohrm...@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for the information Gabor. If it is about securing the
>>>>>>>>>> communication between the REST client and the REST server, then Flink
>>>>>>>>>> already supports enabling mutual SSL authentication [1]. Would this 
>>>>>>>>>> be
>>>>>>>>>> enough to secure the communication and to pass an audit?
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/security/security-ssl/#external--rest-connectivity
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Till
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 3, 2021 at 10:33 AM Gabor Somogyi <
>>>>>>>>>> gabor.g.somo...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Till,
>>>>>>>>>>>
>>>>>>>>>>> Since I'm working in security area 10+ years let me share my
>>>>>>>>>>> thought.
>>>>>>>>>>> I would like to emphasise there are experts better than me but I
>>>>>>>>>>> have some
>>>>>>>>>>> basics.
>>>>>>>>>>> The discussion is open and not trying to tell alone things...
>>>>>>>>>>>
>>>>>>>>>>> > I mean if an attacker can get access to one of the machines,
>>>>>>>>>>> then it
>>>>>>>>>>> should also be possible to obtain the right Kerberos token.
>>>>>>>>>>> Not necessarily. For example if one gets access to a specific
>>>>>>>>>>> user's
>>>>>>>>>>> credentials then it's not possible to compromise other user's
>>>>>>>>>>> jobs, data,
>>>>>>>>>>> etc...
>>>>>>>>>>> Security is like an onion, the more layers has been added the
>>>>>>>>>>> more time an
>>>>>>>>>>> attacker needs to proceed.
>>>>>>>>>>> At the end of the day if one is in, then most probably can find
>>>>>>>>>>> the way but
>>>>>>>>>>> this time is normally enough to sysadmins or security experts to
>>>>>>>>>>> close down the system and minimize the damage.
>>>>>>>>>>>
>>>>>>>>>>> The other thing is that all tokens has a timeout and if the
>>>>>>>>>>> token is
>>>>>>>>>>> invalid then the attacker can't proceed further.
>>>>>>>>>>>
>>>>>>>>>>> > Is Kerberos also the standard authentication protocol for
>>>>>>>>>>> Kubernetes
>>>>>>>>>>> deployments?
>>>>>>>>>>> Kerberos is an industry standard which is cloud/deployment
>>>>>>>>>>> agnostic and it
>>>>>>>>>>> can be used in any deployments including k8s.
>>>>>>>>>>> The main intention is to use kerberos in k8s deployments too
>>>>>>>>>>> since we're
>>>>>>>>>>> going this direction as well.
>>>>>>>>>>> Please see how Spark does this:
>>>>>>>>>>>
>>>>>>>>>>> https://spark.apache.org/docs/latest/security.html#secure-interaction-with-kubernetes
>>>>>>>>>>>
>>>>>>>>>>> Last but not least the most important reason to add at least one
>>>>>>>>>>> strong
>>>>>>>>>>> authentication is that we have users who has
>>>>>>>>>>> hard requirements on this. They're doing security audits and if
>>>>>>>>>>> they fail
>>>>>>>>>>> then it's deal breaking.
>>>>>>>>>>> That is why we have added kerberos at the first place.
>>>>>>>>>>> Unfortunately we
>>>>>>>>>>> can't name them in this public list, however
>>>>>>>>>>> the customers who specifically asked for this were mainly in the
>>>>>>>>>>> banking
>>>>>>>>>>> and telco sector.
>>>>>>>>>>>
>>>>>>>>>>> BR,
>>>>>>>>>>> G
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jun 3, 2021 at 9:20 AM Till Rohrmann <
>>>>>>>>>>> trohrm...@apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>> > Thanks for updating the document Márton. Why is it that banks
>>>>>>>>>>> will
>>>>>>>>>>> > consider it more secure if Flink comes with Kerberos
>>>>>>>>>>> authentication
>>>>>>>>>>> > (assuming a properly secured setup)? I mean if an attacker can
>>>>>>>>>>> get access
>>>>>>>>>>> > to one of the machines, then it should also be possible to
>>>>>>>>>>> obtain the right
>>>>>>>>>>> > Kerberos token.
>>>>>>>>>>> >
>>>>>>>>>>> > I am not an authentication expert and that's why I wanted to
>>>>>>>>>>> ask what are
>>>>>>>>>>> > other authentication protocols other than Kerberos? Why did we
>>>>>>>>>>> select
>>>>>>>>>>> > Kerberos and not any other authentication protocol? Maybe you
>>>>>>>>>>> can list the
>>>>>>>>>>> > pros and cons for the different protocols. Is Kerberos also
>>>>>>>>>>> the standard
>>>>>>>>>>> > authentication protocol for Kubernetes deployments? If not,
>>>>>>>>>>> what would be
>>>>>>>>>>> > the answer when deploying on K8s?
>>>>>>>>>>> >
>>>>>>>>>>> > Cheers,
>>>>>>>>>>> > Till
>>>>>>>>>>> >
>>>>>>>>>>> > On Wed, Jun 2, 2021 at 12:07 PM Gabor Somogyi <
>>>>>>>>>>> gabor.g.somo...@gmail.com>
>>>>>>>>>>> > wrote:
>>>>>>>>>>> >
>>>>>>>>>>> >> Hi team,
>>>>>>>>>>> >>
>>>>>>>>>>> >> Happy to be here and hope I can provide quality additions in
>>>>>>>>>>> the future.
>>>>>>>>>>> >>
>>>>>>>>>>> >> Thank you all for helpful the suggestions!
>>>>>>>>>>> >> Considering them the FLIP has been modified and the work
>>>>>>>>>>> continues on the
>>>>>>>>>>> >> already existing Jira.
>>>>>>>>>>> >>
>>>>>>>>>>> >> BR,
>>>>>>>>>>> >> G
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> On Wed, Jun 2, 2021 at 11:23 AM Márton Balassi <
>>>>>>>>>>> balassi.mar...@gmail.com>
>>>>>>>>>>> >> wrote:
>>>>>>>>>>> >>
>>>>>>>>>>> >>> Thanks, Chesney - I totally missed that. Answered on the
>>>>>>>>>>> ticket too, let
>>>>>>>>>>> >>> us continue there then.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Till, I agree that we should keep this codepath as slim as
>>>>>>>>>>> possible. It
>>>>>>>>>>> >>> is an important design decision that we aim to keep the list
>>>>>>>>>>> of
>>>>>>>>>>> >>> authentication protocols to a minimum. We believe that this
>>>>>>>>>>> should not be a
>>>>>>>>>>> >>> primary concern of Flink and a trusted proxy service (for
>>>>>>>>>>> example Apache
>>>>>>>>>>> >>> Knox) should be used to enable a multitude of enduser
>>>>>>>>>>> authentication
>>>>>>>>>>> >>> mechanisms. The bare minimum of authentication mechanisms to
>>>>>>>>>>> support
>>>>>>>>>>> >>> consequently consist of a single strong authentication
>>>>>>>>>>> protocol for which
>>>>>>>>>>> >>> Kerberos is the enterprise solution and HTTP Basic primary
>>>>>>>>>>> for development
>>>>>>>>>>> >>> and light-weight scenarios.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Added the above wording to G's doc.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> On Tue, Jun 1, 2021 at 11:47 AM Chesnay Schepler <
>>>>>>>>>>> ches...@apache.org>
>>>>>>>>>>> >>> wrote:
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>> There's a related effort:
>>>>>>>>>>> >>>> https://issues.apache.org/jira/browse/FLINK-21108
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> On 6/1/2021 10:14 AM, Till Rohrmann wrote:
>>>>>>>>>>> >>>> > Hi Gabor, welcome to the Flink community!
>>>>>>>>>>> >>>> >
>>>>>>>>>>> >>>> > Thanks for sharing this proposal with the community
>>>>>>>>>>> Márton. In
>>>>>>>>>>> >>>> general, I
>>>>>>>>>>> >>>> > agree that authentication is missing and that this is
>>>>>>>>>>> required for
>>>>>>>>>>> >>>> using
>>>>>>>>>>> >>>> > Flink within an enterprise. The thing I am wondering is
>>>>>>>>>>> whether this
>>>>>>>>>>> >>>> > feature strictly needs to be implemented inside of Flink
>>>>>>>>>>> or whether a
>>>>>>>>>>> >>>> proxy
>>>>>>>>>>> >>>> > setup could do the job? Have you considered this option?
>>>>>>>>>>> If yes, then
>>>>>>>>>>> >>>> it
>>>>>>>>>>> >>>> > would be good to list it under the point of rejected
>>>>>>>>>>> alternatives.
>>>>>>>>>>> >>>> >
>>>>>>>>>>> >>>> > I do see the benefit of implementing this feature inside
>>>>>>>>>>> of Flink if
>>>>>>>>>>> >>>> many
>>>>>>>>>>> >>>> > users need it. If not, then it might be easier for the
>>>>>>>>>>> project to not
>>>>>>>>>>> >>>> > increase the surface area since it makes the overall
>>>>>>>>>>> maintenance
>>>>>>>>>>> >>>> harder.
>>>>>>>>>>> >>>> >
>>>>>>>>>>> >>>> > Cheers,
>>>>>>>>>>> >>>> > Till
>>>>>>>>>>> >>>> >
>>>>>>>>>>> >>>> > On Mon, May 31, 2021 at 4:57 PM Márton Balassi <
>>>>>>>>>>> mbala...@apache.org>
>>>>>>>>>>> >>>> wrote:
>>>>>>>>>>> >>>> >
>>>>>>>>>>> >>>> >> Hi team,
>>>>>>>>>>> >>>> >>
>>>>>>>>>>> >>>> >> Firstly I would like to introduce Gabor or G [1] for
>>>>>>>>>>> short to the
>>>>>>>>>>> >>>> >> community, he is a Spark committer who has recently
>>>>>>>>>>> transitioned to
>>>>>>>>>>> >>>> the
>>>>>>>>>>> >>>> >> Flink Engineering team at Cloudera and is looking
>>>>>>>>>>> forward to
>>>>>>>>>>> >>>> contributing
>>>>>>>>>>> >>>> >> to Apache Flink. Previously G primarily focused on Spark
>>>>>>>>>>> Streaming
>>>>>>>>>>> >>>> and
>>>>>>>>>>> >>>> >> security.
>>>>>>>>>>> >>>> >>
>>>>>>>>>>> >>>> >> Based on requests from our customers G has implemented
>>>>>>>>>>> Kerberos and
>>>>>>>>>>> >>>> HTTP
>>>>>>>>>>> >>>> >> Basic Authentication for the Flink Dashboard and
>>>>>>>>>>> HistoryServer.
>>>>>>>>>>> >>>> Previously
>>>>>>>>>>> >>>> >> lacked an authentication story.
>>>>>>>>>>> >>>> >>
>>>>>>>>>>> >>>> >> We are looking to contribute this functionality back to
>>>>>>>>>>> the
>>>>>>>>>>> >>>> community, we
>>>>>>>>>>> >>>> >> believe that given Flink's maturity there should be a
>>>>>>>>>>> common code
>>>>>>>>>>> >>>> solution
>>>>>>>>>>> >>>> >> for this general pattern.
>>>>>>>>>>> >>>> >>
>>>>>>>>>>> >>>> >> We are looking forward to your feedback on G's design.
>>>>>>>>>>> [2]
>>>>>>>>>>> >>>> >>
>>>>>>>>>>> >>>> >> [1] http://gaborsomogyi.com/
>>>>>>>>>>> >>>> >> [2]
>>>>>>>>>>> >>>> >>
>>>>>>>>>>> >>>> >>
>>>>>>>>>>> >>>>
>>>>>>>>>>> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>>>>>>>>>>> >>>> >>
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>
>>>>>>>>>>>
>>>>>>>>>>

Reply via email to