potiuk commented on code in PR #44976:
URL: https://github.com/apache/airflow/pull/44976#discussion_r1895370581


##########
airflow/dag_processing/bundles/git.py:
##########
@@ -120,9 +154,16 @@ def _has_version(repo: Repo, version: str) -> bool:
         except BadName:
             return False
 
+    def _refresh(self):
+        self.bare_repo.remotes.origin.fetch("+refs/heads/*:refs/heads/*")
+        self.repo.remotes.origin.pull()
+
     def refresh(self) -> None:
         if self.version:
             raise AirflowException("Refreshing a specific version is not 
supported")
 
-        self.bare_repo.remotes.origin.fetch("+refs/heads/*:refs/heads/*")
-        self.repo.remotes.origin.pull()
+        if self.ssh_hook:
+            with self.ssh_hook.get_conn():

Review Comment:
   I have not paid too much attention to those changes so far but happy to 
help. Not sure exactly what is the problem, but if I understand correctly the 
question is about the `repo_url` and `host` established in SSHHook via the SSH 
conn.
   
   Yeah - generally `git` can use SSH protocol (And I think it should be 
generally default for most "production" setup configurations. 
   
   https://git-scm.com/book/ms/v2/Git-on-the-Server-The-Protocols
   
   And in this case indeed we need to somehow make repo_url and SSHHook host 
match. because if they don't then indeed `SSHHook.get_conn()` uses the host 
from SSH conn to establish connection and they connection is reused later.
   
   I believe (from what I see and earlier comments) we are not trying to use 
existing ssh_conn_id - which I find a bit strange, because Connection (and/or 
secrets) is where we can securely store authentication information, so i think 
we **should** pass conn_id - to retrieve the authentication information from 
the Connection.
   
   I can imagine - for example that:
   
   ```
   Connection(
       login = "git"
       host = "github.com"
       extra = {... private_key etc ..)
   ```
   
   Where repo_url = :
   
   ```
   [email protected]:apache/airflow.git
   ```
   
   There is a bit of a problem here indeed because the question is how to make 
host part of the repo_url and connection host match.  We don't have really a 
place to store the "repo" portion of the github URL in SSH Connection, so in 
the current setuo we have a bit of a strange mixup between connection and 
repo_url where part of the connection definition is in the hook and part in the 
repo_url.
   
   IMHO - the cleanest solution would be to create a "GItHook" and "Git 
connection" extending from SSH Hook. We could add "repository" extra to 
"GitConnection" on top of what "SSH Connection" already has - and use it 
exclusively for git connectivity. In this case "repo_url" and "ssh_conn_id" 
would be mutually exclusive. This would avoid a lot of confusion (we have very 
similar case where we mixed configuration of emails sent by Airflow - where 
part of the email configuration is taken from the email conn id and part from 
the configuraiton and it ain't pretty.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to