jedcunningham commented on code in PR #44976:
URL: https://github.com/apache/airflow/pull/44976#discussion_r1907817918
##########
airflow/dag_processing/bundles/git.py:
##########
@@ -40,20 +109,35 @@ class GitDagBundle(BaseDagBundle):
:param repo_url: URL of the git repository
:param tracking_ref: Branch or tag for this DAG bundle
:param subdir: Subdirectory within the repository where the DAGs are
stored (Optional)
+ :param git_conn_id: Connection ID for SSH connection to the repository
(Optional)
Review Comment:
```suggestion
:param git_conn_id: Connection ID for SSH/token based connection to the
repository (Optional)
```
This support auth tokens too right?
##########
airflow/dag_processing/bundles/git.py:
##########
@@ -120,9 +154,16 @@ def _has_version(repo: Repo, version: str) -> bool:
except BadName:
return False
+ def _refresh(self):
+ self.bare_repo.remotes.origin.fetch("+refs/heads/*:refs/heads/*")
+ self.repo.remotes.origin.pull()
+
def refresh(self) -> None:
if self.version:
raise AirflowException("Refreshing a specific version is not
supported")
- self.bare_repo.remotes.origin.fetch("+refs/heads/*:refs/heads/*")
- self.repo.remotes.origin.pull()
+ if self.ssh_hook:
+ with self.ssh_hook.get_conn():
Review Comment:
> This still has to connect to the host machine, not github.com, for
example, but the host where the Airflow is running.
Where/how does this work? I'm not spotting it in the diff.
Are you sure we aren't relying on
[this](https://github.com/apache/airflow/blob/351ac28c7c4dde3a611e4260bbb0f01c7618f261/providers/src/airflow/providers/ssh/hooks/ssh.py#L322-L323)
to populate something that `git` sees when it tries to auth? Paramiko must be
doing some magic here for us, and I still worry part of that magic is an extra
ssh connection.
##########
airflow/providers_manager.py:
##########
@@ -175,6 +175,9 @@ def
_create_customized_form_field_behaviours_schema_validator():
def _check_builtin_provider_prefix(provider_package: str, class_name: str) ->
bool:
+ if "bundles" in provider_package:
+ # TODO: remove this when this package is moved to providers directory
Review Comment:
```suggestion
# TODO: AIP-66: remove this when this package is moved to providers
directory
```
Let's leave this breadcrumb for our future selves
##########
airflow/dag_processing/bundles/git.py:
##########
@@ -40,20 +109,35 @@ class GitDagBundle(BaseDagBundle):
:param repo_url: URL of the git repository
Review Comment:
```suggestion
```
##########
airflow/providers_manager.py:
##########
@@ -676,6 +679,8 @@ def
_discover_all_airflow_builtin_providers_from_local_sources(self) -> None:
self._add_provider_info_from_local_source_files_on_path(path)
except Exception as e:
log.warning("Error when loading 'provider.yaml' files from %s
airflow sources: %s", path, e)
+ # TODO: Remove this when the package is moved to providers
Review Comment:
```suggestion
# TODO: AIP-66: Remove this when the package is moved to providers
```
##########
airflow/dag_processing/bundles/git.py:
##########
@@ -66,20 +150,33 @@ def __init__(self, *, repo_url: str, tracking_ref: str,
subdir: str | None = Non
self.repo.head.set_reference(self.repo.commit(self.version))
self.repo.head.reset(index=True, working_tree=True)
else:
- self.refresh()
+ self._refresh()
+
+ def initialize(self) -> None:
+ if not self.repo_url:
+ raise AirflowException(f"Connection {self.git_conn_id} doesn't
have a git_repo_url")
+ if isinstance(self.repo_url, os.PathLike):
+ self._initialize()
+ elif not self.repo_url.startswith("git@") or not
self.repo_url.endswith(".git"):
+ raise AirflowException(
+ f"Invalid git URL: {self.repo_url}. URL must start with git@
and end with .git"
+ )
+ elif self.repo_url.startswith("git@"):
+ with self.hook.get_conn():
+ self._initialize()
+ else:
+ self._initialize()
Review Comment:
```suggestion
if self.repo_url.startswith("git@") or not
self.repo_url.endswith(".git"):
raise AirflowException(
f"Invalid git URL: {self.repo_url}. URL must start with git@
and end with .git"
)
if self.repo_url.startswith("git@"):
with self.hook.get_conn():
self._initialize()
else:
self._initialize()
```
Don't need a special case for path repo_urls, right? Can it even be a path?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]