Github user EronWright commented on the issue:
https://github.com/apache/flink/pull/3776
@Rucongzhang thanks for the contribution. I think I understand the problem
and your solution, which I will recap. I also found YARN-2704 to be useful
background.
*Problem*:
1. YARN log aggregation depends on an HDFS delegation token, which it
obtains from container token storage not from the UGI. In keytab mode, the
Flink client doesn't upload any delegation tokens, causing log aggregation to
fail.
2. The Flink cluster doesn't renew delegation tokens. Note: Flink does
renew _Kerberos tickets_ using the keytab.
3. When the UGI contains both a delegation token and a Kerberos ticket, the
delegation token is preferred. After expiration, Flink does not fallback to
using the ticket.
*Solution*:
1. Change Flink client to upload delegation tokens. Addresses problem 1.
2 Change Flink cluster to filter out the HDFS delegation token from the
tokens loaded from storage when populating the UGI. Addresses problem 3.
3 Change JM to propagate its stored tokens to the TM, rather than the
tokens from the UGI (which were filtered in (2).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---