Re: [DISCUSS] FLIP-272: Generalized delegation token support

Gabor Somogyi Mon, 21 Nov 2022 09:45:35 -0800

Hi Danny,

Thanks for spending your time to make the design better. My comments added
inline.
Please share your opinion or just correct me if I've misunderstood your
suggestions.


G


On Mon, Nov 21, 2022 at 5:10 PM Danny Cranmer <[email protected]>
wrote:

> Thanks for the FLIP Gabor.
>
> Generally looks good, have a couple of comments.
>
> If I understand correctly we plan on pulling the current implementation
> apart and adding an extra layer of abstraction, to allow arbitrary
> implementations. I notice that most of the delegation token classes [1] are
> missing compatibility annotations, although a couple are @Experimental. Can
> we update these classes to add the compatibility annotations ASAP?
> The DelegationTokenManager class [2] you talk about adding already exists,
> without any compatibility guarantees. The recent FLIPs [3][4] do not cover
> what support un-annotated classes default to, unless I missed it.
>

You've touched an important part which every component must define/obey
sooner or later when it's mature enough.
The feature is just finished so marked as fully experimental as a whole.
The intended developer facing API is "DelegationTokenProvider" (at least
for now).
All the rest which is public can be considered accidental leak (for example
"DelegationTokenManager" where Java doesn't support internal interfaces).

The "DelegationTokenManager" interface came into the picture in a comment
to make it maybe pluggable but now I think that we must delete/change it.
The reason behind is that "obtainDelegationTokens" API has a "Credentials"
class as parameter which binding the whole stuff to Hadoop.
I personally don't insist to any of the choices just saying that an API
shouldn't contain implementation details.

What we can do in short term: if you think there are places where
additional annotations are needed then please share your explicit
suggestions.
When it's clear I can create a jira/PR and we can get a rid of those gaps.

In mid/long term when we see that a fully custom provider can be added to
the framework without Hadoop then API guarantees can come into the picture.


>
> > External connectors need to look into the TokenContainer and when token
> is found with a specific key then it must be used
> Will we implement some form of hook that connectors can subscribe to?
> Otherwise I feel we will duplicate a bunch of logic across connectors
> periodically polling this TokenContainer for updates.
>

Good point, in short not considered until you not mentioned it.
In detail Hadoop components are checking the content of the UGI when
authentication is needed.
Of course this doesn't mean we must do the same here. We are going to do
what is more convenient from API
user perspective.
My original idea is super simple and only 2 lines considering
implementation:
1. get a byte array from the TokenContainer map with a specific key (the
provider puts the serialized token byte array into the map with this key)
2. The byte array can be deserialized by the actual connector (reverse
action done by the provider)

Deserializing the tokens all the time can be time consuming so I tend to
agree that maybe we can add
listener logic. Such way a connector can subscribe for specific tokens.
When the token arrives it receives
a callback where deserialization happens. The deserialized tokens can be
stored which would be definitely a speed-up.

If we're intended to add a listener mechanism in this area then my
suggestions would be:
* define some sort of timeout for the listener execution. I know, normally
this is just storing the value somewhere but if that is somehow
broken then it can kill whole clusters.
* some sort of retry logic would be good but to implement this is not
trivial

All in all the listener mechanism would be more brittle but I see your
point and it makes sense.
Please share your thoughts.


> Thanks,
>
> [1]
>
> https://github.com/apache/flink/tree/0634d0cc0a401d268cc1b3788c8ee63a91ff33a6/flink-runtime/src/main/java/org/apache/flink/runtime/security/token
> [2]
>
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/security/token/DelegationTokenManager.java
> [3]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-196%3A+Source+API+stability+guarantees
> [4]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-197%3A+API+stability+graduation+process
>
> On Wed, Nov 9, 2022 at 4:46 PM Gabor Somogyi <[email protected]>
> wrote:
>
> > OK, removed the fallback part.
> >
> > Thanks for the help!
> >
> > G
> >
> >
> > On Wed, Nov 9, 2022 at 5:03 PM Őrhidi Mátyás <[email protected]>
> > wrote:
> >
> > > looks better! no further concerns
> > >
> > > Cheers,
> > > Matyas
> > >
> > > On Mon, Nov 7, 2022 at 9:21 AM Gabor Somogyi <
> [email protected]>
> > > wrote:
> > >
> > > > Oh gosh, copied wrong config keys so fixed my last mail with green.
> > > >
> > > > On Mon, Nov 7, 2022 at 6:07 PM Gabor Somogyi <
> > [email protected]>
> > > > wrote:
> > > >
> > > > > Hi Matyas,
> > > > >
> > > > > In the meantime I was thinking about the per provider re-obtain
> > feature
> > > > > and here are my thoughts related that:
> > > > > * I think it's a good feature in general but as mentioned I would
> add
> > > it
> > > > > in a separate FLIP
> > > > > * In case of Hadoop providers it just wouldn't work (HBase doesn't
> > have
> > > > > end timestamp so actually HDFS is triggering the re-obtain) but in
> > all
> > > > > non-hadoop providers it's a good idea
> > > > > * Adding "security.delegation.tokens.renewal.retry.backoff" and
> > > > > "security.delegation.tokens.renewal.time-ratio" is needed but as
> you
> > > > > mentioned fallback to kerberos configs just doesn't make sense
> > > > > * In a later FLIP we can add per provider
> > > >
> > "security.delegation.token.provider.{providerName}.renewal.retry.backoff"
> > > > > and/or
> > > >
> "security.delegation.token.provider.{providerName}.renewal.time-ratio"
> > > > > for non-hadoop providers
> > > > > * This is an additional feature which justifies to separate Hadoop
> > and
> > > > > non-hadoop providers on the API level
> > > > >
> > > > > Waiting on your opinion.
> > > > >
> > > > > G
> > > > >
> > > > >
> > > > > On Mon, Nov 7, 2022 at 4:17 PM Gabor Somogyi <
> > > [email protected]>
> > > > > wrote:
> > > > >
> > > > >> Hi Matyas,
> > > > >>
> > > > >> Thanks for your comments, answered inline.
> > > > >>
> > > > >> G
> > > > >>
> > > > >>
> > > > >> On Mon, Nov 7, 2022 at 2:58 PM Őrhidi Mátyás <
> > [email protected]
> > > >
> > > > >> wrote:
> > > > >>
> > > > >>> Hi Gabor,
> > > > >>>
> > > > >>> Thanks for driving this effort! A few thoughts on the topic:
> > > > >>> - Could you please add a few examples of the delegation token
> > > providers
> > > > >>> we
> > > > >>> expected to be added in the near future? Ideally these providers
> > > could
> > > > be
> > > > >>> configured independently from each other.  However the
> > configuration
> > > > >>> defaults mentioned in the FLIP are derived from hadoop
> > > configuration. I
> > > > >>> don't see the point here.
> > > > >>>
> > > > >> A clear plan is to add S3 now and Kafka possibly later on.
> > > > >>
> > > > >> S3 looks straightforward but that doesn't fit into the existing
> > > > framework.
> > > > >> On Kafka side I've added the Kafka provider to Spark so I can
> > imagine
> > > > >> similar solution w/ minor differences.
> > > > >> Please see the Spark solution:
> > > > >>
> > > >
> > >
> >
> https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#security
> > > > >> Here the minor planned difference is that Spark handles Kafka
> token
> > > > >> inside UGI which is not nice but works.
> > > > >> I've just seen similar generalization effort on the Spark side too
> > so
> > > > the
> > > > >> Kafka part may or may not change there.
> > > > >>
> > > > >> Not sure what you mean configs are derived from Hadoop.
> > > > >>
> > > > >> What I can think of is that
> > > > >> "security.delegation.tokens.renewal.retry.backoff" is defaulting
> to
> > > > >> "security.kerberos.tokens.renewal.retry.backoff".
> > > > >> This is basically happening for backward compatibility purposes.
> > > > >>
> > > > >> The other thing what I can think of is that you miss the
> independent
> > > > >> provider token obtain functionality.
> > > > >> When I mean independent configuration I mean each provider has
> it's
> > > own
> > > > >> set of keys which doesn't collide
> > > > >> but the voting system when token obtain must happen remains and
> not
> > > > >> touched in this FLIP.
> > > > >> Under voting system I mean each provider may send back its end
> > > timestamp
> > > > >> and the lowest wins
> > > > >> (at that timestamp all tokens are going to be re-obtained).
> > > > >> If that's what you mean we can think about solutions but it has
> > > nothing
> > > > >> to do with framework generalization.
> > > > >>
> > > > >>
> > > > >>> - Are we planning to support such scenarios where we need to
> > > read/write
> > > > >>> from different authentication realms from the same application.
> Two
> > > > >>> Hadoop
> > > > >>> clusters, Kafka clusters etc? This would need an authentication
> > > > provider
> > > > >>> per source/sink.
> > > > >>>
> > > > >>>
> > > > >> It doesn't need 2 different provider per source/sink to do Kafka
> to
> > > > >> Kafka. Such cases cross-realm trust can be configured w/ principal
> > > > mapping.
> > > > >> Please see the details here:
> > > > >>
> > > https://gist.github.com/gaborgsomogyi/c636f352ccec7730ff41ac1d524cb87d
> > > > >> Even if the gist was originally created for Spark I tend to do the
> > > same
> > > > >> here in the future.
> > > > >>
> > > > >> Just to make an extract here Kafka principal mapping looks like
> > this:
> > > > >> sasl.kerberos.principal.to.local.rules=RULE:[1:$1@$0](.*@
> > > > >> DT2HOST.COMPANY.COM)s/@.*//,DEFAULT
> > > > >>
> > > > >> Providing 2 users in Hadoop world is just impossible because UGI
> is
> > > > >> basically a singleton.
> > > > >> Of course everything can be hacked around (for ex. changing
> current
> > > user
> > > > >> on-the-fly) but that would be such a
> > > > >> headache what I think we must avoid. It would end-up in
> > > synchronization
> > > > >> and performance hell.
> > > > >> I've made some experiments in my Spark era and this would be the
> > same
> > > > >> here :)
> > > > >>
> > > > >>
> > > > >>> Thanks,
> > > > >>> Matyas
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> On Mon, Nov 7, 2022 at 5:10 AM Gabor Somogyi <
> > > > [email protected]>
> > > > >>> wrote:
> > > > >>>
> > > > >>> > Hi team,
> > > > >>> >
> > > > >>> > Delegation token framework is going to be finished soon (added
> in
> > > > >>> FLIP-211
> > > > >>> > <
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-211%3A+Kerberos+delegation+token+framework?src=contextnavpagetreemode
> > > > >>> > >
> > > > >>> > ).
> > > > >>> > Previously there were concerns that the current implementation
> is
> > > > >>> bound to
> > > > >>> > Hadoop and Kerberos authentication. This is fair concern and
> as a
> > > > >>> result
> > > > >>> > we've created a proposal to generalize the delegation token
> > > framework
> > > > >>> > (practically making it authentication agnostic).
> > > > >>> >
> > > > >>> > This can open the path to add further non-hadoop and
> non-Kerberos
> > > > based
> > > > >>> > providers like S3 or many others.
> > > > >>> >
> > > > >>> > One can find the FLIP in:
> > > > >>> > - Wiki:
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-272%3A+Generalized+delegation+token+support
> > > > >>> > - document:
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> https://docs.google.com/document/d/12tFdx1AZVuW9BjwBht_pMNELgrqro8Z5-hzWeaRY4pc/edit?usp=sharing
> > > > >>> >
> > > > >>> > I would like to start a discussion to make the framework
> better.
> > > > >>> >
> > > > >>> > BR,
> > > > >>> > G
> > > > >>> >
> > > > >>>
> > > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-272: Generalized delegation token support

Reply via email to