Hi Team! Let's all calm down a little and not let our emotions affect the discussion too much. There has been a lot of effort spent from all involved parties so this is quite understandable :)
Even though not everyone said this explicitly, it seems that everyone more or less agrees that a feature implementing token renewal is necessary and valuable. The main point of contention is: where should the token renewal logic run and how to get the tokens to wherever needed. >From my perspective the current design is very reasonable at first sight because: 1. It runs the token renewal in a single place avoiding extra CDC workload 2. Does not introduce new processes, extra communication channels etc but piggybacks on existing robust mechanisms. I understand the concerns about adding new things in the resource manager but I think that really depends on how we look at it. We cannot reasonably expect a custom token renewal process to have it's own secure distribution logic like Flink has now, that is a complete overkill. This practically means that we will not have a slim efficient implementation for this but something unnecessarily complex. And the only thing we get in return is a bit less code in the resource manager. >From a logical standpoint the delegation framework needs to run at a centralized place and need to be able to access new task manager processes to achieve all it's design goals. We can drop a single renewer as a design goal but that might be a decision that can affect large scale production runs. Cheers, Gyula On Thu, Feb 3, 2022 at 7:32 PM Chesnay Schepler <ches...@apache.org> wrote: > First of, at no point have we questioned the use-case and importance of > this feature, and the fact that David, Till and me spent time looking at > the FLIP, asking questions, and discussing different aspects of it > should make this obvious. > > I'd appreciate it if you didn't dismiss our replies that quickly. > > > Ok, so we declare that users who try to use delegation tokens in > Flink is dead end code and not supported, right? > > No one has said that. Are you claiming that your design is the /only > possible implementation/ that is capable of achieving the stated goals, > that there are 0 alternatives? On of the *main**points* of these > discussion threads is to discover alternative implementations that maybe > weren't thought of. Yes, that may imply that we amend your design, or > reject it completely and come up with a new one. > > > Let's clarify what (I think) Till proposed to get the imagination juice > flowing. > > At the end of the day, all we need is a way to provide Flink processes > with a token that can be periodically updated. _Who_ issues that token > is irrelevant for the functionality to work. You are proposing for a new > component in the Flink RM to do that; Till is proposing to have some > external process do it. *That's it*. > > How this could look like in practice is fairly straight forwad; add a > pluggable interface (aka, your TokenProvider thing) that is loaded in > each process, which can _somehow_ provide tokens that are then set in > the UserGroupInformation. > _How_ the provider receives token is up to the provider. It _may_ just > talk directly to Kerberos, or it could use some communication channel to > accept tokens from the outside. > This would for example make it a lot easier to properly integrate this > into the lifecycle of the process, as we'd sidestep the whole "TM is > running but still needs a Token" issue; it could become a proper setup > step of the process that is independent from other Flink processes. > > /Discuss/. > > On 03/02/2022 18:57, Gabor Somogyi wrote: > >> And even > > if we do it like this, there is no guarantee that it works because there > > can be other applications bombing the KDC with requests. > > > > 1. The main issue to solve here is that workloads using delegation tokens > > are stopping after 7 days with default configuration. > > 2. This is not new design, it's rock stable and performing well in Spark > > for years. > > > >> From a > > maintainability and separation of concerns perspective I'd rather have > this > > as some kind of external tool/service that makes KDC scale better and > that > > Flink processes can talk to to obtain the tokens. > > > > Ok, so we declare that users who try to use delegation tokens in Flink is > > dead end code and not supported, right? Then this must be explicitely > > written in the security documentation that such users who use that > feature > > are left behind. > > > > As I see the discussion turned away from facts and started to speak about > > feelings. If you have strategic problems with the feature please put your > > -1 on the vote and we can spare quite some time. > > > > G > > > > > > On Thu, 3 Feb 2022, 18:34 Till Rohrmann,<trohrm...@apache.org> wrote: > > > >> I don't have a good alternative solution but it sounds to me a bit as > if we > >> are trying to solve Kerberos' scalability problems within Flink. And > even > >> if we do it like this, there is no guarantee that it works because there > >> can be other applications bombing the KDC with requests. From a > >> maintainability and separation of concerns perspective I'd rather have > this > >> as some kind of external tool/service that makes KDC scale better and > that > >> Flink processes can talk to to obtain the tokens. > >> > >> Cheers, > >> Till > >> > >> On Thu, Feb 3, 2022 at 6:01 PM Gabor Somogyi<gabor.g.somo...@gmail.com> > >> wrote: > >> > >>> Oh and the most important reason I've forgotten. > >>> Without the feature in the FLIP all secure workloads with delegation > >> tokens > >>> are going to stop when tokens are reaching it's max lifetime 馃檪 > >>> This is around 7 days with default config... > >>> > >>> On Thu, Feb 3, 2022 at 5:30 PM Gabor Somogyi<gabor.g.somo...@gmail.com > > > >>> wrote: > >>> > >>>> That's not the single purpose of the feature but in some environments > >> it > >>>> caused problems. > >>>> The main intention is not to deploy keytab to all the nodes because > the > >>>> attack surface is bigger + reduce the KDC load. > >>>> I've already described the situation previously in this thread so > >> copying > >>>> it here. > >>>> > >>>> --------COPY-------- > >>>> "KDC *may* collapse under some circumstances" is the proper wording. > >>>> > >>>> We have several customers who are executing workloads on Spark/Flink. > >>> Most > >>>> of the time I'm facing their > >>>> daily issues which is heavily environment and use-case dependent. I've > >>>> seen various cases: > >>>> * where the mentioned ~1k nodes were working fine > >>>> * where KDC thought the number of requests are coming from DDOS attack > >> so > >>>> discontinued authentication > >>>> * where KDC was simply not responding because of the load > >>>> * where KDC was intermittently had some outage (this was the most > nasty > >>>> thing) > >>>> > >>>> Since you're managing relatively big cluster then you know that KDC is > >>> not > >>>> only used by Spark/Flink workloads > >>>> but the whole company IT infrastructure is bombing it so it really > >>> depends > >>>> on other factors too whether KDC is reaching > >>>> it's limit or not. Not sure what kind of evidence are you looking for > >> but > >>>> I'm not authorized to share any information about > >>>> our clients data. > >>>> > >>>> One thing is for sure. The more external system types are used in > >>>> workloads (for ex. HDFS, HBase, Hive, Kafka) which > >>>> are authenticating through KDC the more possibility to reach this > >>>> threshold when the cluster is big enough. > >>>> --------COPY-------- > >>>> > >>>>> The FLIP mentions scaling issues with 200 nodes; it's really > >> surprising > >>>> to me that such a small number of requests can already cause issues. > >>>> > >>>> One node/task doesn't mean 1 request. The following type of kerberos > >> auth > >>>> types has been seen by me which can run at the same time: > >>>> HDFS, Hbase, Hive, Kafka, all DBs (oracle, mariaDB, etc...) > >> Additionally > >>>> one task is not necessarily opens 1 connection. > >>>> > >>>> All in all I don't have steps to reproduce but we've faced this > >>> already... > >>>> G > >>>> > >>>> > >>>> On Thu, Feb 3, 2022 at 5:15 PM Chesnay Schepler<ches...@apache.org> > >>>> wrote: > >>>> > >>>>> What I don't understand is how this could overload the KDC. Aren't > >>>>> tokens valid for a relatively long time period? > >>>>> > >>>>> For new deployments where many TMs are started at once I could > imagine > >>>>> it temporarily, but shouldn't the accesses to the KDC eventually > >>>>> naturally spread out? > >>>>> > >>>>> The FLIP mentions scaling issues with 200 nodes; it's really > >> surprising > >>>>> to me that such a small number of requests can already cause issues. > >>>>> > >>>>> On 03/02/2022 16:14, Gabor Somogyi wrote: > >>>>>>> I would prefer not choosing the first option > >>>>>> Then the second option may play only. > >>>>>> > >>>>>>> I am not a Kerberos expert but is it really so that every > >> application > >>>>> that > >>>>>> wants to use Kerberos needs to implement the token propagation > >> itself? > >>>>> This > >>>>>> somehow feels as if there is something missing. > >>>>>> > >>>>>> OK, so first some kerberos + token intro. > >>>>>> > >>>>>> Some basics: > >>>>>> * TGT can be created from keytab > >>>>>> * TGT is needed to obtain TGS (called token) > >>>>>> * Authentication only works with TGS -> all places where external > >>>>> system is > >>>>>> needed either a TGT or TGS needed > >>>>>> > >>>>>> There are basically 2 ways to authenticate to a kerberos secured > >>>>> external > >>>>>> system: > >>>>>> 1. One needs a kerberos TGT which MUST be propagated to all JVMs. > >> Here > >>>>> each > >>>>>> and every JVM obtains a TGS by itself which bombs the KDC that may > >>>>> collapse. > >>>>>> 2. One needs a kerberos TGT which exists only on a single place (in > >>> this > >>>>>> case JM). JM gets a TGS which MUST be propagated to all TMs because > >>>>>> otherwise authentication fails. > >>>>>> > >>>>>> Now the whole system works in a way that keytab file (we can imagine > >>>>> that > >>>>>> as plaintext password) is reachable on all nodes. > >>>>>> This is a relatively huge attack surface. Now the main intention is: > >>>>>> * Instead of propagating keytab file to all nodes propagate a TGS > >>> which > >>>>> has > >>>>>> limited lifetime (more secure) > >>>>>> * Do the TGS generation in a single place so KDC may not collapse + > >>>>> having > >>>>>> keytab only on a single node can be better protected > >>>>>> > >>>>>> As a final conclusion if there is a place which expects to do > >> kerberos > >>>>>> authentication then it's a MUST to have either TGT or TGS. > >>>>>> Now it's done in a pretty unsecure way. The questions are the > >>> following: > >>>>>> * Do we want to leave this unsecure keytab propagation like this and > >>>>> bomb > >>>>>> KDC? > >>>>>> * If no then how do we propagate the more secure token to TMs. > >>>>>> > >>>>>> If the answer to the first question is no then the FLIP can be > >>> abandoned > >>>>>> and doesn't worth the further effort. > >>>>>> If the answer is yes then we can talk about the how part. > >>>>>> > >>>>>> G > >>>>>> > >>>>>> > >>>>>> On Thu, Feb 3, 2022 at 3:42 PM Till Rohrmann<trohrm...@apache.org> > >>>>> wrote: > >>>>>>> I would prefer not choosing the first option > >>>>>>> > >>>>>>>> Make the TM accept tasks only after registration(not sure if it's > >>>>>>> possible or makes sense at all) > >>>>>>> > >>>>>>> because it effectively means that we change how Flink's component > >>>>> lifecycle > >>>>>>> works for distributing Kerberos tokens. It also effectively means > >>> that > >>>>> a TM > >>>>>>> cannot make progress until connected to a RM. > >>>>>>> > >>>>>>> I am not a Kerberos expert but is it really so that every > >> application > >>>>> that > >>>>>>> wants to use Kerberos needs to implement the token propagation > >>> itself? > >>>>> This > >>>>>>> somehow feels as if there is something missing. > >>>>>>> > >>>>>>> Cheers, > >>>>>>> Till > >>>>>>> > >>>>>>> On Thu, Feb 3, 2022 at 3:29 PM Gabor Somogyi < > >>>>> gabor.g.somo...@gmail.com> > >>>>>>> wrote: > >>>>>>> > >>>>>>>>> Isn't this something the underlying resource management system > >>>>> could > >>>>>>> do > >>>>>>>> or which every process could do on its own? > >>>>>>>> > >>>>>>>> I was looking for such feature but not found. > >>>>>>>> Maybe we can solve the propagation easier but then I'm waiting on > >>>>> better > >>>>>>>> suggestion. > >>>>>>>> If anybody has better/more simple idea then please point to a > >>> specific > >>>>>>>> feature which works on all resource management systems. > >>>>>>>> > >>>>>>>>> Here's an example for the TM to run workloads without being > >>> connected > >>>>>>>> to the RM, without ever having a valid token > >>>>>>>> > >>>>>>>> All in all I see the main problem. Not sure what is the reason > >>> behind > >>>>>>> that > >>>>>>>> a TM accepts tasks w/o registration but clearly not helping here. > >>>>>>>> I basically see 2 possible solutions: > >>>>>>>> * Make the TM accept tasks only after registration(not sure if > >> it's > >>>>>>>> possible or makes sense at all) > >>>>>>>> * We send tokens right after container creation with > >>>>>>>> "updateDelegationTokens" > >>>>>>>> Not sure which one is more realistic to do since I'm not involved > >>> the > >>>>> new > >>>>>>>> feature. > >>>>>>>> WDYT? > >>>>>>>> > >>>>>>>> > >>>>>>>> On Thu, Feb 3, 2022 at 3:09 PM Till Rohrmann < > >> trohrm...@apache.org> > >>>>>>> wrote: > >>>>>>>>> Hi everyone, > >>>>>>>>> > >>>>>>>>> Sorry for joining this discussion late. I also did not read all > >>>>>>> responses > >>>>>>>>> in this thread so my question might already be answered: Why does > >>>>> Flink > >>>>>>>>> need to be involved in the propagation of the tokens? Why do we > >>> need > >>>>>>>>> explicit RPC calls in the Flink domain? Isn't this something the > >>>>>>> underlying > >>>>>>>>> resource management system could do or which every process could > >> do > >>>>> on > >>>>>>> its > >>>>>>>>> own? I am a bit worried that we are making Flink responsible for > >>>>>>> something > >>>>>>>>> that it is not really designed to do so. > >>>>>>>>> > >>>>>>>>> Cheers, > >>>>>>>>> Till > >>>>>>>>> > >>>>>>>>> On Thu, Feb 3, 2022 at 2:54 PM Chesnay Schepler < > >>> ches...@apache.org> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Here's an example for the TM to run workloads without being > >>>>> connected > >>>>>>> to > >>>>>>>>>> the RM, while potentially having a valid token: > >>>>>>>>>> > >>>>>>>>>> 1. TM registers at RM > >>>>>>>>>> 2. JobMaster requests slot from RM -> TM gets notified > >>>>>>>>>> 3. JM fails over > >>>>>>>>>> 4. TM re-offers the slot to the failed over JobMaster > >>>>>>>>>> 5. TM reconnects to RM at some point > >>>>>>>>>> > >>>>>>>>>> Here's an example for the TM to run workloads without being > >>>>> connected > >>>>>>> to > >>>>>>>>>> the RM, without ever having a valid token: > >>>>>>>>>> > >>>>>>>>>> 1. TM1 has a valid token and is running some tasks. > >>>>>>>>>> 2. TM1 crashes > >>>>>>>>>> 3. TM2 is started to take over, and re-uses the working > >>> directory > >>>>> of > >>>>>>>>>> TM1 (new feature in 1.15!) > >>>>>>>>>> 4. TM2 recovers the previous slot allocations > >>>>>>>>>> 5. TM2 is informed about leading JM > >>>>>>>>>> 6. TM2 starts registration with RM > >>>>>>>>>> 7. TM2 offers slots to JobMaster > >>>>>>>>>> 8. TM2 accepts task submission from JobMaster > >>>>>>>>>> 9. ...some time later the registration completes... > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On 03/02/2022 14:24, Gabor Somogyi wrote: > >>>>>>>>>>>> but it can happen that the JobMaster+TM collaborate to run > >> stuff > >>>>>>>>>>> without the TM being registered at the RM > >>>>>>>>>>> > >>>>>>>>>>> Honestly I'm not educated enough within Flink to give an > >> example > >>> to > >>>>>>>>>>> such scenario. > >>>>>>>>>>> Until now I thought JM defines tasks to be done and TM just > >>> blindly > >>>>>>>>>>> connects to external systems and does the processing. > >>>>>>>>>>> All in all if external systems can be touched when JM + TM > >>>>>>>>>>> collaboration happens then we need to consider that in the > >>> design. > >>>>>>>>>>> Since I don't have an example scenario I don't know what > >> exactly > >>>>>>> needs > >>>>>>>>>>> to be solved. > >>>>>>>>>>> I think we need an example case to decide whether we face a > >> real > >>>>>>> issue > >>>>>>>>>>> or the design is not leaking. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Thu, Feb 3, 2022 at 2:12 PM Chesnay Schepler < > >>>>> ches...@apache.org> > >>>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> > Just to learn something new. I think local recovery is > >>>>> clear to > >>>>>>>>>>> me which is not touching external systems like Kafka or > so > >>>>>>>>>>> (correct me if I'm wrong). Is it possible that such case > >> the > >>>>> user > >>>>>>>>>>> code just starts to run blindly w/o JM coordination and > >>>>> connects > >>>>>>>>>>> to external systems to do data processing? > >>>>>>>>>>> > >>>>>>>>>>> Local recovery itself shouldn't touch external systems; > >> the > >>> TM > >>>>>>>>>>> cannot just run user-code without the JobMaster being > >>>>> involved, > >>>>>>>>>>> but it can happen that the JobMaster+TM collaborate to > run > >>>>> stuff > >>>>>>>>>>> without the TM being registered at the RM. > >>>>>>>>>>> > >>>>>>>>>>> On 03/02/2022 13:48, Gabor Somogyi wrote: > >>>>>>>>>>>> > Any error in loading the provider (be it by accident > or > >>>>>>>>>>>> explicit checks) then is a setup error and we can fail > >> the > >>>>>>>>>> cluster. > >>>>>>>>>>>> Fail fast is a good direction in my view. In Spark I > >> wanted > >>>>> to > >>>>>>> go > >>>>>>>>>>>> to this direction but there were other opinions so there > >>> if a > >>>>>>>>>>>> provider is not loaded then the workload goes further. > >>>>>>>>>>>> Of course the processing will fail if the token is > >>> missing... > >>>>>>>>>>>> > Requiring HBase (and Hadoop for that matter) to be on > >> the > >>>>> JM > >>>>>>>>>>>> system classpath would be a bit unfortunate. Have you > >>>>> considered > >>>>>>>>>>>> loading the providers as plugins? > >>>>>>>>>>>> > >>>>>>>>>>>> Even if it's unfortunate the actual implementation is > >>>>> depending > >>>>>>>>>>>> on that already. Moving HBase and/or all token providers > >>> into > >>>>>>>>>>>> plugins is a possibility. > >>>>>>>>>>>> That way if one wants to use a specific provider then a > >>>>> plugin > >>>>>>>>>>>> need to be added. If we would like to go to this > >> direction > >>> I > >>>>>>>>>>>> would do that in a separate > >>>>>>>>>>>> FLIP not to have feature creep here. The actual FLIP > >>> already > >>>>>>>>>>>> covers several thousand lines of code changes. > >>>>>>>>>>>> > >>>>>>>>>>>> > This is missing from the FLIP. From my experience with > >>> the > >>>>>>>>>>>> metric reporters, having the implementation rely on the > >>>>>>>>>>>> configuration is really annoying for testing purposes. > >>> That's > >>>>>>> why > >>>>>>>>>>>> I suggested factories; they can take care of extracting > >> all > >>>>>>>>>>>> parameters that the implementation needs, and then pass > >>> them > >>>>>>>>>>>> nicely via the constructor. > >>>>>>>>>>>> > >>>>>>>>>>>> ServiceLoader provided services must have a norarg > >>>>> constructor > >>>>>>>>>>>> where no parameters can be passed. > >>>>>>>>>>>> As a side note testing delegation token providers is > pain > >>> in > >>>>> the > >>>>>>>>>>>> ass and not possible with automated tests without > >> creating > >>> a > >>>>>>>>>>>> fully featured kerberos cluster with KDC, HDFS, HBase, > >>> Kafka, > >>>>>>>>>> etc.. > >>>>>>>>>>>> We've had several tries in Spark but then gave it up > >>> because > >>>>> of > >>>>>>>>>>>> the complexity and the flakyness of it so I wouldn't > care > >>>>> much > >>>>>>>>>>>> about unit testing. > >>>>>>>>>>>> The sad truth is that most of the token providers can be > >>>>> tested > >>>>>>>>>>>> manually on cluster. > >>>>>>>>>>>> > >>>>>>>>>>>> Of course this doesn't mean that the whole code is not > >>>>> intended > >>>>>>>>>>>> to be covered with tests. I mean couple of parts can be > >>>>>>>>>>>> automatically tested but providers are not such. > >>>>>>>>>>>> > >>>>>>>>>>>> > This also implies that any fields of the provider > >>> wouldn't > >>>>>>>>>>>> inherently have to be mutable. > >>>>>>>>>>>> > >>>>>>>>>>>> I think this is not an issue. A provider connects to a > >>>>> service, > >>>>>>>>>>>> obtains token(s) and then close the connection and never > >>> seen > >>>>>>> the > >>>>>>>>>>>> need of an intermediate state. > >>>>>>>>>>>> I've just mentioned the singleton behavior to be clear. > >>>>>>>>>>>> > >>>>>>>>>>>> > One examples is a TM restart + local recovery, where > >> the > >>> TM > >>>>>>>>>>>> eagerly offers the previous set of slots to the leading > >> JM. > >>>>>>>>>>>> Just to learn something new. I think local recovery is > >>> clear > >>>>> to > >>>>>>>>>>>> me which is not touching external systems like Kafka or > >> so > >>>>>>>>>>>> (correct me if I'm wrong). > >>>>>>>>>>>> Is it possible that such case the user code just starts > >> to > >>>>> run > >>>>>>>>>>>> blindly w/o JM coordination and connects to external > >>> systems > >>>>> to > >>>>>>>>>>>> do data processing? > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Thu, Feb 3, 2022 at 1:09 PM Chesnay Schepler > >>>>>>>>>>>> <ches...@apache.org> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> 1) > >>>>>>>>>>>> The manager certainly shouldn't check for specific > >>>>>>>>>>>> implementations. > >>>>>>>>>>>> The problem with classpath-based checks is it can > >>> easily > >>>>>>>>>>>> happen that the provider can't be loaded in the > first > >>>>> place > >>>>>>>>>>>> (e.g., if you don't use reflection, which you > >> currently > >>>>>>> kinda > >>>>>>>>>>>> force), and in that case Flink can't tell whether > the > >>>>> token > >>>>>>>>>>>> is not required or the cluster isn't set up > >> correctly. > >>>>>>>>>>>> As I see it we shouldn't try to be clever; if the > >> users > >>>>>>> wants > >>>>>>>>>>>> kerberos, then have him enable the providers. Any > >> error > >>>>> in > >>>>>>>>>>>> loading the provider (be it by accident or explicit > >>>>> checks) > >>>>>>>>>>>> then is a setup error and we can fail the cluster. > >>>>>>>>>>>> If we still want to auto-detect whether the provider > >>>>> should > >>>>>>>>>>>> be used, note that using factories would make this > >>>>> easier; > >>>>>>>>>>>> the factory can check the classpath (not having any > >>>>> direct > >>>>>>>>>>>> dependencies on HBase avoids the case above), and > the > >>>>>>>>>>>> provider no longer needs reflection because it will > >>> only > >>>>> be > >>>>>>>>>>>> used iff HBase is on the CP. > >>>>>>>>>>>> > >>>>>>>>>>>> Requiring HBase (and Hadoop for that matter) to be > on > >>>>> the JM > >>>>>>>>>>>> system classpath would be a bit unfortunate. Have > you > >>>>>>>>>>>> considered loading the providers as plugins? > >>>>>>>>>>>> > >>>>>>>>>>>> 2) > DelegationTokenProvider#init method > >>>>>>>>>>>> > >>>>>>>>>>>> This is missing from the FLIP. From my experience > >> with > >>>>> the > >>>>>>>>>>>> metric reporters, having the implementation rely on > >> the > >>>>>>>>>>>> configuration is really annoying for testing > >> purposes. > >>>>>>> That's > >>>>>>>>>>>> why I suggested factories; they can take care of > >>>>> extracting > >>>>>>>>>>>> all parameters that the implementation needs, and > >> then > >>>>> pass > >>>>>>>>>>>> them nicely via the constructor. This also implies > >> that > >>>>> any > >>>>>>>>>>>> fields of the provider wouldn't inherently have to > be > >>>>>>> mutable. > >>>>>>>>>>>> > workloads are not yet running until the initial > >> token > >>>>> set > >>>>>>>>>>>> is not propagated. > >>>>>>>>>>>> > >>>>>>>>>>>> This isn't necessarily true. It can happen that > tasks > >>> are > >>>>>>>>>>>> being deployed to the TM without it having > registered > >>>>> with > >>>>>>>>>>>> the RM; there is currently no requirement that a TM > >>> must > >>>>> be > >>>>>>>>>>>> registered before it may offer slots / accept task > >>>>>>>>>> submissions. > >>>>>>>>>>>> One examples is a TM restart + local recovery, where > >>> the > >>>>> TM > >>>>>>>>>>>> eagerly offers the previous set of slots to the > >> leading > >>>>> JM. > >>>>>>>>>>>> On 03/02/2022 12:39, Gabor Somogyi wrote: > >>>>>>>>>>>>> Thanks for the quick response! > >>>>>>>>>>>>> Appreciate your invested time... > >>>>>>>>>>>>> > >>>>>>>>>>>>> G > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Thu, Feb 3, 2022 at 11:12 AM Chesnay Schepler > >>>>>>>>>>>>> <ches...@apache.org> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks for answering the questions! > >>>>>>>>>>>>> > >>>>>>>>>>>>> 1) Does the HBase provider require HBase to be > >> on > >>>>> the > >>>>>>>>>>>>> classpath? > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> To be instantiated no, to obtain a token yes. > >>>>>>>>>>>>> > >>>>>>>>>>>>> If so, then could it even be loaded if > Hbase > >>> is > >>>>> on > >>>>>>>>>>>>> the classpath? > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> The provider can be loaded but inside the provider > >> it > >>>>> would > >>>>>>>>>>>>> detect whether HBase is on classpath. > >>>>>>>>>>>>> Just to be crystal clear here this is the actual > >>>>>>>>>>>>> implementation what I would like to take over into > >> the > >>>>>>>>>> Provider. > >>>>>>>>>>>>> Please see: > >>>>>>>>>>>>> > >> > https://github.com/apache/flink/blob/e6210d40491ff28c779b8604e425f01983f8a3d7/flink-yarn/src/main/java/org/apache/flink/yarn/Utils.java#L243-L254 > >>>>>>>>>>>>> I've considered to load only the necessary > Providers > >>> but > >>>>>>>>>>>>> that would mean a generic Manager need to know that > >> if > >>>>> the > >>>>>>>>>>>>> newly loaded Provider is > >>>>>>>>>>>>> instanceof HBaseDelegationTokenProvider, then it > >> need > >>>>> to be > >>>>>>>>>>>>> skipped. > >>>>>>>>>>>>> I think it would add unnecessary complexity to the > >>>>> Manager > >>>>>>>>>>>>> and it would contain ugly code parts(at least in my > >>> view > >>>>>>>>>>>>> ugly), like this > >>>>>>>>>>>>> if (provider instanceof > HBaseDelegationTokenProvider > >>> && > >>>>>>>>>>>>> hbaseIsNotOnClasspath()) { > >>>>>>>>>>>>> // Skip intentionally > >>>>>>>>>>>>> } else if (provider instanceof > >>>>>>>>>>>>> SomethingElseDelegationTokenProvider && > >>>>>>>>>>>>> somethingElseIsNotOnClasspath()) { > >>>>>>>>>>>>> // Skip intentionally > >>>>>>>>>>>>> } else { > >>>>>>>>>>>>> providers.put(provider.serviceName(), provider); > >>>>>>>>>>>>> } > >>>>>>>>>>>>> I think the least code and most clear approach is > to > >>>>> load > >>>>>>>>>>>>> the providers and decide inside whether everything > >> is > >>>>> given > >>>>>>>>>>>>> to obtain a token. > >>>>>>>>>>>>> > >>>>>>>>>>>>> If not, then you're assuming the classpath > >> of > >>>>> the > >>>>>>>>>>>>> JM/TM to be the same, which isn't necessarily > >> true > >>>>> (in > >>>>>>>>>>>>> general; and also if Hbase is loaded from the > >>>>>>> user-jar). > >>>>>>>>>>>>> I'm not assuming that the classpath of JM/TM must > be > >>> the > >>>>>>>>>>>>> same. If the HBase jar is coming from the user-jar > >>> then > >>>>> the > >>>>>>>>>>>>> HBase code is going to use UGI within the JVM when > >>>>>>>>>>>>> authentication required. > >>>>>>>>>>>>> Of course I've not yet tested within Flink but in > >>> Spark > >>>>> it > >>>>>>>>>>>>> is working fine. > >>>>>>>>>>>>> All in all JM/TM classpath may be different but on > >>> both > >>>>>>> side > >>>>>>>>>>>>> HBase jar must exists somehow. > >>>>>>>>>>>>> > >>>>>>>>>>>>> 2) None of the /Providers/ in your PoC get > >> access > >>> to > >>>>>>> the > >>>>>>>>>>>>> configuration. Only the /Manager/ is. Note that > >> I > >>> do > >>>>>>> not > >>>>>>>>>>>>> know whether there is a need for the providers > >> to > >>>>> have > >>>>>>>>>>>>> access to the config, as that's very > >>> implementation > >>>>>>>>>>>>> specific I suppose. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> You're right. Since this is just a POC and I don't > >>> have > >>>>>>>>>>>>> green light I've not put too many effort for a > >> proper > >>>>>>>>>>>>> self-review. DelegationTokenProvider#init method > >> must > >>>>> get > >>>>>>>>>>>>> Flink configuration. > >>>>>>>>>>>>> The reason behind is that several further > >>> configuration > >>>>> can > >>>>>>>>>>>>> be find out using that. A good example is to get > >>> Hadoop > >>>>>>> conf. > >>>>>>>>>>>>> The rationale behind is the same just like before, > >> it > >>>>> would > >>>>>>>>>>>>> be good to create a generic Manager as possible. > >>>>>>>>>>>>> To be more specific some code must load Hadoop conf > >>>>> which > >>>>>>>>>>>>> could be the Manager or the Provider. > >>>>>>>>>>>>> If the manager does that then the generic Manager > >> must > >>>>> be > >>>>>>>>>>>>> modified all the time when something special thing > >> is > >>>>>>> needed > >>>>>>>>>>>>> for a new provider. > >>>>>>>>>>>>> This could be super problematic when a custom > >> provider > >>>>> is > >>>>>>>>>>>>> written. > >>>>>>>>>>>>> > >>>>>>>>>>>>> 10) I'm not sure myself. It could be something > >> as > >>>>>>>>>>>>> trivial as creating some temporary directory in > >>>>> HDFS I > >>>>>>>>>>>>> suppose. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> I've not found of such task.YARN and K8S are not > >>>>> expecting > >>>>>>>>>>>>> such things from executors and workloads are not > yet > >>>>>>> running > >>>>>>>>>>>>> until the initial token set is not propagated. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On 03/02/2022 10:23, Gabor Somogyi wrote: > >>>>>>>>>>>>>> Please see my answers inline. Hope provided > >>>>> satisfying > >>>>>>>>>> answers to all > >>>>>>>>>>>>>> questions. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> G > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Thu, Feb 3, 2022 at 9:17 AM Chesnay > >> Schepler< > >>>>>>>>>> ches...@apache.org> <mailto:ches...@apache.org> wrote: > >>>>>>>>>>>>>>> I have a few question that I'd appreciate if > >> you > >>>>>>> could > >>>>>>>>>> answer them. > >>>>>>>>>>>>>>> 1. How does the Provider know whether it > >> is > >>>>>>>>>> required or not? > >>>>>>>>>>>>>>> All registered providers which are registered > >>>>>>> properly > >>>>>>>>>> are going to be > >>>>>>>>>>>>>> loaded and asked to obtain tokens. Worth to > >>> mention > >>>>>>>>>> every provider > >>>>>>>>>>>>>> has the right to decide whether it wants to > >>> obtain > >>>>>>>>>> tokens or not (bool > >>>>>>>>>>>>>> delegationTokensRequired()). For instance if > >>>>> provider > >>>>>>>>>> detects that > >>>>>>>>>>>>>> HBase is not on classpath or not configured > >>>>> properly > >>>>>>>>>> then no tokens are > >>>>>>>>>>>>>> obtained from that specific provider. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> You may ask how a provider is registered. Here > >> it > >>>>> is: > >>>>>>>>>>>>>> The provider is on classpath + there is a > >>> META-INF > >>>>>>> file > >>>>>>>>>> which contains the > >>>>>>>>>>>>>> name of the provider, for example: > >>>>>>>>>>>>>> > >> > META-INF/services/org.apache.flink.runtime.security.token.DelegationTokenProvider > >>>>>>>>>>>>>> < > >> > https://github.com/apache/flink/compare/master...gaborgsomogyi:dt?expand=1#diff-b65ee7e64c5d2dfbb683d3569fc3e42f4b5a8052ab83d7ac21de5ab72f428e0b > >>>>>>>>>> < > >>>>>>>>>> > >> > https://github.com/apache/flink/compare/master...gaborgsomogyi:dt?expand=1#diff-b65ee7e64c5d2dfbb683d3569fc3e42f4b5a8052ab83d7ac21de5ab72f428e0b > >>>>>>>>>>>>>>> 1. How does the configuration of > Providers > >>>>> work > >>>>>>>>>> (how do they get > >>>>>>>>>>>>>>> access to a configuration)? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Flink configuration is going to be passed to > >> all > >>>>>>>>>> providers. Please see the > >>>>>>>>>>>>>> POC here: > >>>>>>>>>>>>>> > >> > https://github.com/apache/flink/compare/master...gaborgsomogyi:dt?expand=1 > >>>>>>>>>>>>>> Service specific configurations are loaded > >>>>> on-the-fly. > >>>>>>>>>> For example in HBase > >>>>>>>>>>>>>> case it looks for HBase configuration class > >> which > >>>>> will > >>>>>>>>>> be instantiated > >>>>>>>>>>>>>> within the provider. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 1. How does a user select providers? (Is > >> it > >>>>>>> purely > >>>>>>>>>> based on the > >>>>>>>>>>>>>>> provider being on the classpath?) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Providers can be explicitly turned off with > >> the > >>>>>>>>>> following config: > >>>>>>>>>>>>>> "security.kerberos.tokens.${name}.enabled". > >> I've > >>>>> never > >>>>>>>>>> seen that 2 > >>>>>>>>>>>>>> different implementation would exist for a > >>> specific > >>>>>>>>>>>>>> external service, but if this edge case would > >>> exist > >>>>>>>>>> then the mentioned > >>>>>>>>>>>>>> config need to be added, a new provider with a > >>>>>>>>>> different name need to be > >>>>>>>>>>>>>> implemented and registered. > >>>>>>>>>>>>>> All in all we've seen that provider handling > is > >>> not > >>>>>>>>>> user specific task but > >>>>>>>>>>>>>> a cluster admin one. If a specific provider is > >>>>> needed > >>>>>>>>>> then it's implemented > >>>>>>>>>>>>>> once per company, registered once > >>>>>>>>>>>>>> to the clusters and then all users may or may > >> not > >>>>> use > >>>>>>>>>> the obtained tokens. > >>>>>>>>>>>>>> Worth to mention the system will know which > >> token > >>>>> need > >>>>>>>>>> to be used when HDFS > >>>>>>>>>>>>>> is accessed, this part is automatic. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 1. How can a user override an existing > >>>>> provider? > >>>>>>>>>>>>>>> Pease see the previous bulletpoint. > >>>>>>>>>>>>>>> 1. What is DelegationTokenProvider#name() > >>> used > >>>>>>> for? > >>>>>>>>>>>>>>> By default all providers which are registered > >>>>>>> properly > >>>>>>>>>> (on classpath + > >>>>>>>>>>>>>> META-INF entry) are on by default. With > >>>>>>>>>>>>>> "security.kerberos.tokens.${name}.enabled" a > >>>>> specific > >>>>>>>>>> provider can be > >>>>>>>>>>>>>> turned off. > >>>>>>>>>>>>>> Additionally I'm intended to use this in log > >>>>> entries > >>>>>>>>>> later on for debugging > >>>>>>>>>>>>>> purposes. For example "hadoopfs provider > >>> obtained 2 > >>>>>>>>>> tokens with ID...". > >>>>>>>>>>>>>> This would help what and when is happening > >>>>>>>>>>>>>> with tokens. The same applies to TaskManager > >>> side: > >>>>> "2 > >>>>>>>>>> hadoopfs provider > >>>>>>>>>>>>>> tokens arrived with ID...". Important to note > >>> that > >>>>> the > >>>>>>>>>> secret part will be > >>>>>>>>>>>>>> hidden in the mentioned log entries to keep > the > >>>>>>>>>>>>>> attach surface low. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 1. What happens if the names of 2 > >> providers > >>>>> are > >>>>>>>>>> identical? > >>>>>>>>>>>>>>> Presume you mean 2 different classes which > >> both > >>>>>>>>>> registered and having the > >>>>>>>>>>>>>> same logic inside. This case both will be > >> loaded > >>>>> and > >>>>>>>>>> both is going to > >>>>>>>>>>>>>> obtain token(s) for the same service. > >>>>>>>>>>>>>> Both obtained token(s) are going to be added > to > >>> the > >>>>>>>>>> UGI. As a result the > >>>>>>>>>>>>>> second will overwrite the first but the order > >> is > >>>>> not > >>>>>>>>>> defined. Since both > >>>>>>>>>>>>>> token(s) are valid no matter which one is > >>>>>>>>>>>>>> used then access to the external system will > >>> work. > >>>>>>>>>>>>>> When the class names are same then service > >> loader > >>>>> only > >>>>>>>>>> loads a single entry > >>>>>>>>>>>>>> because services are singletons. That's the > >>> reason > >>>>> why > >>>>>>>>>> state inside > >>>>>>>>>>>>>> providers are not advised. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 1. Will we directly load the provider, or > >>>>> first > >>>>>>>>>> load a factory > >>>>>>>>>>>>>>> (usually preferable)? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Intended to load a provider directly by DTM. > >> We > >>>>> can > >>>>>>>>>> add an extra layer to > >>>>>>>>>>>>>> have factory but after consideration I came to > >> a > >>>>>>>>>> conclusion that it would > >>>>>>>>>>>>>> be and overkill this case. > >>>>>>>>>>>>>> Please have a look how it's planned to load > >>>>> providers > >>>>>>>>>> now: > >> > https://github.com/apache/flink/compare/master...gaborgsomogyi:dt?expand=1#diff-d56a0bc77335ff23c0318f6dec1872e7b19b1a9ef6d10fff8fbaab9aecac94faR54-R81 > >>>>>>>>>>>>>>> 1. What is the Credentials class (it > would > >>>>>>>>>> necessarily have to be a > >>>>>>>>>>>>>>> public api as well)? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Credentials class is coming from Hadoop. My > >> main > >>>>>>>>>> intention was not to bind > >>>>>>>>>>>>>> the implementation to Hadoop completely. It is > >>> not > >>>>>>>>>> possible because of the > >>>>>>>>>>>>>> following reasons: > >>>>>>>>>>>>>> * Several functionalities are must because > >> there > >>>>> are > >>>>>>> no > >>>>>>>>>> alternatives, > >>>>>>>>>>>>>> including but not limited to login from > keytab, > >>>>> proper > >>>>>>>>>> TGT cache handling, > >>>>>>>>>>>>>> passing tokens to Hadoop services like HDFS, > >>> HBase, > >>>>>>>>>> Hive, etc. > >>>>>>>>>>>>>> * The partial win is that the whole delegation > >>>>> token > >>>>>>>>>> framework is going to > >>>>>>>>>>>>>> be initiated if hadoop-common is on classpath > >>>>> (Hadoop > >>>>>>>>>> is optional in core > >>>>>>>>>>>>>> libraries) > >>>>>>>>>>>>>> The possibility to eliminate Credentials from > >> API > >>>>>>> could > >>>>>>>>>> be: > >>>>>>>>>>>>>> * to convert Credentials to byte array forth > >> and > >>>>> back > >>>>>>>>>> while a provider > >>>>>>>>>>>>>> gives back token(s): I think this would be an > >>>>> overkill > >>>>>>>>>> and would make the > >>>>>>>>>>>>>> API less clear what to give back what Manager > >>>>>>>>>> understands > >>>>>>>>>>>>>> * to re-implement Credentials internal > >> structure > >>>>> in a > >>>>>>>>>> POJO, here the same > >>>>>>>>>>>>>> convert forth and back would happen between > >>>>> provider > >>>>>>>>>> and manager. I think > >>>>>>>>>>>>>> this case would be the re-invent the wheel > >>> scenario > >>>>>>>>>>>>>>> 1. What does the TaskManager do with the > >>>>> received > >>>>>>>>>> token? > >>>>>>>>>>>>>>> Puts the tokens into the UserGroupInformation > >>>>>>> instance > >>>>>>>>>> for the current > >>>>>>>>>>>>>> user. Such way Hadoop compatible services can > >>> pick > >>>>> up > >>>>>>>>>> the tokens from there > >>>>>>>>>>>>>> properly. > >>>>>>>>>>>>>> This is an existing pattern inside Spark. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 1. Is there any functionality in the > >>>>> TaskManager > >>>>>>>>>> that could require a > >>>>>>>>>>>>>>> token on startup (i.e., before > registering > >>>>> with > >>>>>>>>>> the RM)? > >>>>>>>>>>>>>>> Never seen such functionality in Spark and > >> after > >>>>>>>>>> analysis not seen in > >>>>>>>>>>>>>> Flink too. If you have something in mind which > >>> I've > >>>>>>>>>> missed plz help me out. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On 11/01/2022 14:58, Gabor Somogyi wrote: > >>>>>>>>>>>>>>> Hi All, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hope all of you have enjoyed the holiday > >> season. > >>>>>>>>>>>>>>> I would like to start the discussion on > >>> FLIP-211< > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-211%3A+Kerberos+delegation+token+framework > >>>>>>>>>> < > >>>>>>>>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-211%3A+Kerberos+delegation+token+framework > >>>>>>>>>> < > >>>>>>>>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-211%3A+Kerberos+delegation+token+framework > >>>>>>>>>> < > >>>>>>>>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-211%3A+Kerberos+delegation+token+framework > >>>>>>>>>>>>>>> which > >>>>>>>>>>>>>>> aims to provide a > >>>>>>>>>>>>>>> Kerberos delegation token framework that > >>>>>>>>>> /obtains/renews/distributes tokens > >>>>>>>>>>>>>>> out-of-the-box. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Please be aware that the FLIP wiki area is > not > >>>>> fully > >>>>>>>>>> done since the > >>>>>>>>>>>>>>> discussion may > >>>>>>>>>>>>>>> change the feature in major ways. The > proposal > >>>>> can be > >>>>>>>>>> found in a google doc > >>>>>>>>>>>>>>> here< > >> > https://docs.google.com/document/d/1JzMbQ1pCJsLVz8yHrCxroYMRP2GwGwvacLrGyaIx5Yc/edit?fbclid=IwAR0vfeJvAbEUSzHQAAJfnWTaX46L6o7LyXhMfBUCcPrNi-uXNgoOaI8PMDQ > >>>>>>>>>> < > >>>>>>>>>> > >> > https://docs.google.com/document/d/1JzMbQ1pCJsLVz8yHrCxroYMRP2GwGwvacLrGyaIx5Yc/edit?fbclid=IwAR0vfeJvAbEUSzHQAAJfnWTaX46L6o7LyXhMfBUCcPrNi-uXNgoOaI8PMDQ > >>>>>>>>>> < > >>>>>>>>>> > >> > https://docs.google.com/document/d/1JzMbQ1pCJsLVz8yHrCxroYMRP2GwGwvacLrGyaIx5Yc/edit?fbclid=IwAR0vfeJvAbEUSzHQAAJfnWTaX46L6o7LyXhMfBUCcPrNi-uXNgoOaI8PMDQ > >>>>>>>>>> < > >>>>>>>>>> > >> > https://docs.google.com/document/d/1JzMbQ1pCJsLVz8yHrCxroYMRP2GwGwvacLrGyaIx5Yc/edit?fbclid=IwAR0vfeJvAbEUSzHQAAJfnWTaX46L6o7LyXhMfBUCcPrNi-uXNgoOaI8PMDQ > >>>>>>>>>>>>>>> . > >>>>>>>>>>>>>>> As the community agrees on the approach the > >>>>> content > >>>>>>>>>> will be moved to the > >>>>>>>>>>>>>>> wiki page. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Feel free to add your thoughts to make this > >>>>> feature > >>>>>>>>>> better! > >>>>>>>>>>>>>>> BR, > >>>>>>>>>>>>>>> G > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>> >