potiuk commented on code in PR #44976:
URL: https://github.com/apache/airflow/pull/44976#discussion_r1895370581
##########
airflow/dag_processing/bundles/git.py:
##########
@@ -120,9 +154,16 @@ def _has_version(repo: Repo, version: str) -> bool:
except BadName:
return False
+ def _refresh(self):
+ self.bare_repo.remotes.origin.fetch("+refs/heads/*:refs/heads/*")
+ self.repo.remotes.origin.pull()
+
def refresh(self) -> None:
if self.version:
raise AirflowException("Refreshing a specific version is not
supported")
- self.bare_repo.remotes.origin.fetch("+refs/heads/*:refs/heads/*")
- self.repo.remotes.origin.pull()
+ if self.ssh_hook:
+ with self.ssh_hook.get_conn():
Review Comment:
I have not paid too much attention to those changes so far but happy to
help. Not sure exactly what is the problem, but if I understand correctly the
question is about the `repo_url` and `host` established in SSHHook via the SSH
conn.
Yeah - generally `git` can use SSH protocol (And I think it should be
generally default for most "production" setup configurations.
https://git-scm.com/book/ms/v2/Git-on-the-Server-The-Protocols
And in this case indeed we need to somehow make repo_url and SSHHook host
match. because if they don't then indeed `SSHHook.get_conn()` uses the host
from SSH conn to establish connection and they connection is reused later.
I believe (from what I see and earlier comments) we are not trying to use
existing ssh_conn_id - which I find a bit strange, because Connection (and/or
secrets) is where we can securely store authentication information, so i think
we **should** pass conn_id - to retrieve the authentication information from
the Connection.
I can imagine - for example that:
```
Connection(
login = "git"
host = "github.com"
extra = {... private_key etc ..)
```
Where repo_url = :
```
[email protected]:apache/airflow.git
```
There is a bit of a problem here indeed because the question is how to make
host part of the repo_url and connection host match. We don't have really a
place to store the "repo" portion of the github URL in SSH Connection, so in
the current setuo we have a bit of a strange mixup between connection and
repo_url where part of the connection definition is in the hook and part in the
repo_url.
IMHO - the cleanest solution would be to create a `GItSSHHook` and `Git SSH
Connection` extending from `SSHHook`. We could add "repository" extra to `Git
SSH Connection` on top of what `SSH Connection` already has - and use it
exclusively for git connectivity. In this case `repo_url` and `ssh_conn_id`
would be mutually exclusive. This would avoid a lot of confusion (we have very
similar case where we mixed configuration of emails sent by Airflow - where
part of the email configuration is taken from the email conn id and part from
the configuraiton and it ain't pretty.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]