seyoon-lim opened a new issue, #40749:
URL: https://github.com/apache/airflow/issues/40749

   ### Description
   
   Hello,
   
   I would like to introduce and propose a feature I have personally 
implemented in the SparkSubmitHook for managing Spark connections.
   
   This feature focuses on managing the Kerberos-related principal and keytab 
within the connection settings.
   
   The basic idea is to store the keytab as a base64 encoded credential. When 
submitting a Spark job, the credential is decoded and saved as a file, and its 
path is specified during the submission.
   
   
![image](https://github.com/user-attachments/assets/1d242cd5-0484-4057-9159-4212862253ae)
   
   ```python
   def _resolve_connection(self) -> None:
       ...
       if self._keytab is None:
           if (b64_keytab := extra.get("keytab")) is not None:
               if self._principal is None:
                   raise AirflowException("Principal is not set")
               self._keytab = self._get_keytab_from_conn(b64_keytab)
       ...
   
   def _get_keytab_from_conn(self, b64_keytab: str) -> str:
       temp_dir_path = Path(tempfile.gettempdir()).absolute()
       temp_file_name = f"airflow_keytab__{self._principal}"
   
       keytab_path = temp_dir_path / temp_file_name
       staging_path = temp_dir_path / f".{temp_file_name}.{time.time()}"
   
       try:
           with open(staging_path, "wb") as f:
               self.log.info("Saving keytab to %s", staging_path)
               f.write(base64.b64decode(b64_keytab))
   
           self.log.info(
               "Moving keytab from %s to %s", staging_path, keytab_path
           )
           shutil.move(staging_path, keytab_path)
           return str(keytab_path)
       except Exception as err:
           self.log.error("Failed to save keytab: %s", err)
           raise
       finally:
           if staging_path.exists():
               self.log.info("Removing staging keytab %s", staging_path)
               staging_path.unlink()
   ```
   
   ### Use case/motivation
   
   Setting up the keytab for each worker can be challenging, and managing the 
keytab each time it changes can be cumbersome. By storing this information 
within the connection, we can reduce the need to deploy the keytab every time.
   
   I would appreciate your consideration of this proposal.
   
   Thank you.
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to