Hi  JunFan,

> By the way, maybe this should be added in the migration plan or
intergation section in the FLIP-211.

Going to add this soon.

> Besides, I have a question that the KDC will collapse when the cluster
reached 200 nodes you described
in the google doc. Do you have any attachment or reference to prove it?

"KDC *may* collapse under some circumstances" is the proper wording.

We have several customers who are executing workloads on Spark/Flink. Most
of the time I'm facing their
daily issues which is heavily environment and use-case dependent. I've seen
various cases:
* where the mentioned ~1k nodes were working fine
* where KDC thought the number of requests are coming from DDOS attack so
discontinued authentication
* where KDC was simply not responding because of the load
* where KDC was intermittently had some outage (this was the most nasty
thing)

Since you're managing relatively big cluster then you know that KDC is not
only used by Spark/Flink workloads
but the whole company IT infrastructure is bombing it so it really depends
on other factors too whether KDC is reaching
it's limit or not. Not sure what kind of evidence are you looking for but
I'm not authorized to share any information about
our clients data.

One thing is for sure. The more external system types are used in workloads
(for ex. HDFS, HBase, Hive, Kafka) which
are authenticating through KDC the more possibility to reach this threshold
when the cluster is big enough.

All in all this feature is here to help all users never reach this
limitation.

BR,
G


On Thu, Jan 13, 2022 at 1:00 PM 张俊帆 <zuston.sha...@gmail.com> wrote:

> Hi G
>
> Thanks for your quick reply. I think reserving the config of
> *security.kerberos.fetch.delegation-token*
> and simplifying disable the token fetching is a good idea.By the way,
> maybe this should be added
> in the migration plan or intergation section in the FLIP-211.
>
> Besides, I have a question that the KDC will collapse when the cluster
> reached 200 nodes you described
> in the google doc. Do you have any attachment or reference to prove it?
> Because in our internal per-cluster,
> the nodes reaches > 1000 and KDC looks good. Do i missed or misunderstood
> something? Please correct me.
>
> Best
> JunFan.
> On Jan 13, 2022, 5:26 PM +0800, dev@flink.apache.org, wrote:
> >
> >
> https://docs.google.com/document/d/1JzMbQ1pCJsLVz8yHrCxroYMRP2GwGwvacLrGyaIx5Yc/edit?fbclid=IwAR0vfeJvAbEUSzHQAAJfnWTaX46L6o7LyXhMfBUCcPrNi-uXNgoOaI8PMDQ
>

Reply via email to