potiuk commented on a change in pull request #16954:
URL: https://github.com/apache/airflow/pull/16954#discussion_r668703521



##########
File path: tests/operators/test_python.py
##########
@@ -817,6 +822,31 @@ def f():
 
         self._run_as_operator(f, requirements=['funcsigs', 'dill'], 
system_site_packages=False)
 
+    def test_private_repo_requirements(self):
+
+        with create_session() as session:
+            self._clean_stop_pypi_server()
+            self._initiate_pypi_server()

Review comment:
       This could be pytest fixture or at the very least it should be either in 
try/finally or in setUp()/tearDown(). otherwise there is a risk the server 
stays running after test throws an exception.

##########
File path: airflow/utils/python_virtualenv.py
##########
@@ -35,12 +37,35 @@ def _generate_virtualenv_cmd(tmp_dir: str, python_bin: str, 
system_site_packages
     return cmd
 
 
-def _generate_pip_install_cmd(tmp_dir: str, requirements: List[str]) -> 
Optional[List[str]]:
+def _generate_pip_install_cmd(tmp_dir: str,
+                              requirements: List[str],
+                              connection_id: Optional[str] = None
+                              ) -> Optional[List[str]]:
     if not requirements:
         return None
-    # direct path alleviates need to activate
-    cmd = [f'{tmp_dir}/bin/pip', 'install']
-    return cmd + requirements
+
+    if connection_id:
+        con: Connection = BaseHook.get_connection(connection_id)
+        user = con.login
+        schema = con.schema or 'http'
+        password = con.get_password()
+        port = con.port
+        host = con.host
+        if user:
+            index_url = 
f"{schema}://{user}:{password}@{host}:{port}/repository/python/simple"

Review comment:
       There are few problems here:
   * port is Optional [int] - it might be None - we need to provide sensible 
default if we want to hard-code : (or  handle None) case.
   * "/repository/python/simple should not be hard-coded. I imagine registries 
can have different path. If anything the "/repository/python/simple" might be a 
default, but it should be overridable via extra in connection
   * `https://` should be default for private registries. There are good 
reasons for private registry to be private and I cannot imagine a case where it 
would be http-based, leaving it essentially unprotected and with user/passwords 
sent in plain text.
   

##########
File path: setup.cfg
##########
@@ -160,6 +160,7 @@ install_requires =
     typing-extensions>=3.7.4;python_version<"3.8"
     unicodecsv>=0.14.1
     werkzeug~=1.0, >=1.0.1
+    pypiserver~=1.4

Review comment:
       We only need pypi server for development. There is no need to make whole 
airflow depend on it in `install_requires`. Please move it to `[devel]` extra 
in `setup.py`.

##########
File path: tests/operators/test_python.py
##########
@@ -817,6 +822,31 @@ def f():
 
         self._run_as_operator(f, requirements=['funcsigs', 'dill'], 
system_site_packages=False)
 
+    def test_private_repo_requirements(self):
+
+        with create_session() as session:
+            self._clean_stop_pypi_server()
+            self._initiate_pypi_server()
+            session.execute("delete from connection where conn_id = 
'private_repo_test_conn';")
+            cn = Connection("private_repo_test_conn",

Review comment:
       There is no need to create the connection. It's enough to patch the 
method with `AIRFLOW__CONN__NN` environment variables 
(https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html#storing-a-connection-in-environment-variables
  - also there is an example of how you can generate the connection url from 
the raw values - incluidng percent-encoding etc. 
https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html#generating-a-connection-uri)
 . Another problem with this implementation is that it creates side-effect - 
the connection remains in the database after the test is run. If you 
termporarily patch env variable for the test, it will work faster and will not 
leave a side effect. 

##########
File path: tests/operators/test_python.py
##########
@@ -856,6 +886,25 @@ def f():
 
         self._run_as_operator(f, python_version=3, use_dill=False, 
requirements=['dill'])
 
+    @staticmethod
+    def _initiate_pypi_server():
+        try:
+            if not os.path.exists('/root/packages'):
+                os.makedirs('/root/packages')
+            subprocess.Popen(['pypi-server'])
+            time.sleep(1.5)
+        except Exception as e:
+            pass
+
+    @staticmethod
+    def _clean_stop_pypi_server(port: bytes = b"8080"):

Review comment:
       Too complex. running system command `killall  -9 pypi-server` does the 
job in one line of code.. And with the proper setup/teardown sequence, you 
should store the pid of the process started and try to kill it with SIGTERM 
rather than SIGKILL first anyway - this method should only be treated as a 
"safety net",

##########
File path: tests/operators/test_python.py
##########
@@ -856,6 +886,25 @@ def f():
 
         self._run_as_operator(f, python_version=3, use_dill=False, 
requirements=['dill'])
 
+    @staticmethod
+    def _initiate_pypi_server():
+        try:
+            if not os.path.exists('/root/packages'):
+                os.makedirs('/root/packages')
+            subprocess.Popen(['pypi-server'])
+            time.sleep(1.5)

Review comment:
       How do we know the magic value of 1.5 is good ? Are you sure it is 
enough for all cases (running in CI etc?). That's a very bad idea to sleep in 
unit tests. It slows down the whole test suite. There should be a better way to 
check if the server is running. One option is to run an http request to it in a 
loop until it succeeds if there is no better way.

##########
File path: airflow/utils/python_virtualenv.py
##########
@@ -35,12 +37,35 @@ def _generate_virtualenv_cmd(tmp_dir: str, python_bin: str, 
system_site_packages
     return cmd
 
 
-def _generate_pip_install_cmd(tmp_dir: str, requirements: List[str]) -> 
Optional[List[str]]:
+def _generate_pip_install_cmd(tmp_dir: str,
+                              requirements: List[str],
+                              connection_id: Optional[str] = None
+                              ) -> Optional[List[str]]:
     if not requirements:
         return None
-    # direct path alleviates need to activate
-    cmd = [f'{tmp_dir}/bin/pip', 'install']
-    return cmd + requirements
+
+    if connection_id:
+        con: Connection = BaseHook.get_connection(connection_id)
+        user = con.login
+        schema = con.schema or 'http'
+        password = con.get_password()
+        port = con.port
+        host = con.host
+        if user:
+            index_url = 
f"{schema}://{user}:{password}@{host}:{port}/repository/python/simple"
+        else:
+            index_url = f"{schema}://{host}:{port}/repository/python/simple"
+        private_cmd = [f'{tmp_dir}/bin/pip',
+                       'install',
+                       f'--trusted-host', host,

Review comment:
       I will revert to @uranusjr  here. He is our local PyPI expert (and pypi 
committer) so I trust 100% in his statements here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to