cb149 commented on issue #19752:
URL: https://github.com/apache/airflow/issues/19752#issuecomment-987055637


   @potiuk I'll give it a try, though I am also wondering about other parts of 
the SparkSubmitHook code, e.g. this part in [def 
on_kill(self)](https://github.com/apache/airflow/blob/866a601b76e219b3c043e1dbbc8fb22300866351/airflow/providers/apache/spark/hooks/spark_submit.py#L637)
   ```python
    if self._keytab is not None and self._principal is not None:
                       # we are ignoring renewal failures from renew_from_kt
                       # here as the failure could just be due to a 
non-renewable ticket,
                       # we still attempt to kill the yarn application
                       renew_from_kt(self._principal, self._keytab, 
exit_on_fail=False)
                       env = os.environ.copy()
                       env["KRB5CCNAME"] = airflow_conf.get('kerberos', 
'ccache')
   ```
   
   - **renew_from_kt** will fail if there is no "airflow kerberos" ticket 
renewer or if the user hasn't somehow initialized a credentials cache at the 
ccache path from the config
   - Since this is using that one ccache from config, what happens if two or 
more SparkSubmitOperators with different keytabs timeout/get killed at the 
exact same time. Would the following be possible or would the scheduler not run 
the two kill commands at the same time?
   1. SparkSubmitOperator A with keytab/principal for dev_user and 
SparkSubmitOperator B with keytab/principal for ops_user are killed/timeout at 
the same time
   2. renew_from_kt for A is called
   3. renew_from_kt for B is called
   4. subprocess with kill_cmd for A is opened (fails cause ops_user is not 
allowed to modify YARN jobs of dev_user)
   
   Whats the design decision here to use **renew_from_kt** instead of 
creating/using a temporary ccache location for each YARN kill?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to