cb149 edited a comment on issue #19752: URL: https://github.com/apache/airflow/issues/19752#issuecomment-987055637
@potiuk I'll give it a try, though I am also wondering about other parts of the SparkSubmitHook code, e.g. this part in [def on_kill(self)](https://github.com/apache/airflow/blob/866a601b76e219b3c043e1dbbc8fb22300866351/airflow/providers/apache/spark/hooks/spark_submit.py#L637) ```python if self._keytab is not None and self._principal is not None: # we are ignoring renewal failures from renew_from_kt # here as the failure could just be due to a non-renewable ticket, # we still attempt to kill the yarn application renew_from_kt(self._principal, self._keytab, exit_on_fail=False) env = os.environ.copy() env["KRB5CCNAME"] = airflow_conf.get('kerberos', 'ccache') ``` - **renew_from_kt** will fail if there is no "airflow kerberos" ticket renewer or if the user hasn't somehow initialized a credentials cache at the ccache path from the config - Since this is using that one ccache from config, what happens if two or more SparkSubmitOperators with different keytabs timeout/get killed at the exact same time. Would the following be possible or would the scheduler not run the two kill commands in parallel? 1. SparkSubmitOperator A with keytab/principal for dev_user and SparkSubmitOperator B with keytab/principal for ops_user are killed/timeout at the same time 2. renew_from_kt for A is called 3. renew_from_kt for B is called (at the same time or shortly after 2. but before 4.) 4. subprocess with kill_cmd for A is opened (fails cause ops_user is not allowed to modify YARN jobs of dev_user) Whats the design decision here to use **renew_from_kt** instead of creating/using a temporary ccache location for each YARN kill? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
