jedcunningham commented on code in PR #44976:
URL: https://github.com/apache/airflow/pull/44976#discussion_r1907817918


##########
airflow/dag_processing/bundles/git.py:
##########
@@ -40,20 +109,35 @@ class GitDagBundle(BaseDagBundle):
     :param repo_url: URL of the git repository
     :param tracking_ref: Branch or tag for this DAG bundle
     :param subdir: Subdirectory within the repository where the DAGs are 
stored (Optional)
+    :param git_conn_id: Connection ID for SSH connection to the repository 
(Optional)

Review Comment:
   ```suggestion
       :param git_conn_id: Connection ID for SSH/token based connection to the 
repository (Optional)
   ```
   
   This support auth tokens too right?
   
   



##########
airflow/dag_processing/bundles/git.py:
##########
@@ -120,9 +154,16 @@ def _has_version(repo: Repo, version: str) -> bool:
         except BadName:
             return False
 
+    def _refresh(self):
+        self.bare_repo.remotes.origin.fetch("+refs/heads/*:refs/heads/*")
+        self.repo.remotes.origin.pull()
+
     def refresh(self) -> None:
         if self.version:
             raise AirflowException("Refreshing a specific version is not 
supported")
 
-        self.bare_repo.remotes.origin.fetch("+refs/heads/*:refs/heads/*")
-        self.repo.remotes.origin.pull()
+        if self.ssh_hook:
+            with self.ssh_hook.get_conn():

Review Comment:
   > This still has to connect to the host machine, not github.com, for 
example, but the host where the Airflow is running.
   
   Where/how does this work? I'm not spotting it in the diff.
   
   Are you sure we aren't relying on 
[this](https://github.com/apache/airflow/blob/351ac28c7c4dde3a611e4260bbb0f01c7618f261/providers/src/airflow/providers/ssh/hooks/ssh.py#L322-L323)
 to populate something that `git` sees when it tries to auth? Paramiko must be 
doing some magic here for us, and I still worry part of that magic is an extra 
ssh connection.



##########
airflow/providers_manager.py:
##########
@@ -175,6 +175,9 @@ def 
_create_customized_form_field_behaviours_schema_validator():
 
 
 def _check_builtin_provider_prefix(provider_package: str, class_name: str) -> 
bool:
+    if "bundles" in provider_package:
+        # TODO: remove this when this package is moved to providers directory

Review Comment:
   ```suggestion
           # TODO: AIP-66: remove this when this package is moved to providers 
directory
   ```
   
   Let's leave this breadcrumb for our future selves



##########
airflow/dag_processing/bundles/git.py:
##########
@@ -40,20 +109,35 @@ class GitDagBundle(BaseDagBundle):
     :param repo_url: URL of the git repository

Review Comment:
   ```suggestion
   ```



##########
airflow/providers_manager.py:
##########
@@ -676,6 +679,8 @@ def 
_discover_all_airflow_builtin_providers_from_local_sources(self) -> None:
                     
self._add_provider_info_from_local_source_files_on_path(path)
             except Exception as e:
                 log.warning("Error when loading 'provider.yaml' files from %s 
airflow sources: %s", path, e)
+        # TODO: Remove this when the package is moved to providers

Review Comment:
   ```suggestion
           # TODO: AIP-66: Remove this when the package is moved to providers
   ```



##########
airflow/dag_processing/bundles/git.py:
##########
@@ -66,20 +150,33 @@ def __init__(self, *, repo_url: str, tracking_ref: str, 
subdir: str | None = Non
             self.repo.head.set_reference(self.repo.commit(self.version))
             self.repo.head.reset(index=True, working_tree=True)
         else:
-            self.refresh()
+            self._refresh()
+
+    def initialize(self) -> None:
+        if not self.repo_url:
+            raise AirflowException(f"Connection {self.git_conn_id} doesn't 
have a git_repo_url")
+        if isinstance(self.repo_url, os.PathLike):
+            self._initialize()
+        elif not self.repo_url.startswith("git@") or not 
self.repo_url.endswith(".git"):
+            raise AirflowException(
+                f"Invalid git URL: {self.repo_url}. URL must start with git@ 
and end with .git"
+            )
+        elif self.repo_url.startswith("git@"):
+            with self.hook.get_conn():
+                self._initialize()
+        else:
+            self._initialize()

Review Comment:
   ```suggestion
           if self.repo_url.startswith("git@") or not 
self.repo_url.endswith(".git"):
               raise AirflowException(
                   f"Invalid git URL: {self.repo_url}. URL must start with git@ 
and end with .git"
               )
   
           if self.repo_url.startswith("git@"):
               with self.hook.get_conn():
                   self._initialize()
           else:
               self._initialize()
   ```
   
   Don't need a special case for path repo_urls, right? Can it even be a path?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to