softyoungha commented on code in PR #40942:
URL: https://github.com/apache/airflow/pull/40942#discussion_r1688248481


##########
airflow/utils/process_utils.py:
##########
@@ -162,14 +162,17 @@ def signal_procs(sig):
     return returncodes
 
 
-def execute_in_subprocess(cmd: list[str], cwd: str | None = None) -> None:
+def execute_in_subprocess(cmd: list[str], cwd: str | None = None, env_vars: 
dict | None = None) -> None:
     """
     Execute a process and stream output to logger.
 
     :param cmd: command and arguments to run
     :param cwd: Current working directory passed to the Popen constructor
+    :param env_vars: Additional environment variables to set for the 
subprocess. If None,
+        the subprocess will inherit the current environment variables of the 
parent process
+        (``os.environ``)
     """
-    execute_in_subprocess_with_kwargs(cmd, cwd=cwd)
+    execute_in_subprocess_with_kwargs(cmd, cwd=cwd, env=dict(os.environ, 
**env_vars) if env_vars else None)

Review Comment:
   As you suggested, it seems better to add it. 
   Simply assigning `ArgNotSet` to `env_vars` value may not be sufficient for 
handling all cases. 
   I will modify it by adding flags such as `inherit_env_vars` or 
`override_env_vars` with default True
   (until now, the code always fetched from `os.environ`)
   
   > I excluded cases you mentioned because I thought the 
`--system-site-package` option worked based on `PATH`. Ah, I just found out 
that it is controlled by site now :)
   
   <details> 
   
   <summary>Used sample codes</summary>
   
   ```python
   import os
   import sys
   
   print(sys.executable)
   ##### output
   # 
$HOME/.local/share/hatch/env/virtual/apache-airflow/4VQ7OEnp/airflow-310/bin/python
   
   from airflow.utils.process_utils import execute_in_subprocess_with_kwargs
   
   execute_in_subprocess_with_kwargs(
       ["bash", "-c", "env"], env={"MY_ENV": "VALUE"}
   )
   
   execute_in_subprocess_with_kwargs(
       ["bash", "-c", "echo $PATH"],
       env={}        # empty environment
   )
   # [2024-07-23T23:58:07.097+0900] {process_utils.py:190} INFO - Output:
   # [2024-07-23T23:58:07.098+0900] {process_utils.py:194} INFO - 
PWD=$HOME/projects/airflow-main
   # [2024-07-23T23:58:07.098+0900] {process_utils.py:194} INFO - MY_ENV=VALUE
   # [2024-07-23T23:58:07.098+0900] {process_utils.py:194} INFO - SHLVL=0
   
   # create venv
   execute_in_subprocess_with_kwargs(
       [sys.executable, "-m", "venv", "myenv", "--system-site-packages"]
   )
   
   # check site packages
   execute_in_subprocess_with_kwargs(
       [f"{os.getcwd()}/myenv/bin/python", "-c", "import site; 
print(site.getsitepackages())"],
       env={}      # empty environment
   )
   # [2024-07-23T23:58:08.020+0900] {process_utils.py:190} INFO - Output:
   # [2024-07-23T23:58:08.030+0900] {process_utils.py:194} INFO - [
   #   '$HOME/projects/airflow-main/myenv/lib/python3.10/site-packages',
   #   '$HOME/projects/airflow-main/myenv/local/lib/python3.10/dist-packages',
   #   '$HOME/projects/airflow-main/myenv/lib/python3/dist-packages',
   #   '$HOME/projects/airflow-main/myenv/lib/python3.10/dist-packages',
   #   '/usr/lib/python3.10/site-packages',
   #   '/usr/local/lib/python3.10/dist-packages',
   #   '/usr/lib/python3/dist-packages',
   #   '/usr/lib/python3.10/dist-packages'
   # ]
   
   # pip show apache-airflow
   execute_in_subprocess_with_kwargs(
       [f"{os.getcwd()}/myenv/bin/python", "-m", "pip", "show", 
"apache-airflow"],
       env={}      # empty environment
   )
   # [2024-07-24T00:08:34.668+0900] {process_utils.py:190} INFO - Output:
   # [2024-07-24T00:08:36.171+0900] {process_utils.py:194} INFO - Name: 
apache-airflow
   # [2024-07-24T00:08:36.171+0900] {process_utils.py:194} INFO - Version: 
2.10.0.dev0
   # [2024-07-24T00:08:36.171+0900] {process_utils.py:194} INFO - Summary: 
Programmatically author, schedule and monitor data pipelines
   # [2024-07-24T00:08:36.171+0900] {process_utils.py:194} INFO - Home-page:
   # [2024-07-24T00:08:36.172+0900] {process_utils.py:194} INFO - Author:
   # [2024-07-24T00:08:36.172+0900] {process_utils.py:194} INFO - Author-email: 
Apache Software Foundation <[email protected]>
   # [2024-07-24T00:08:36.172+0900] {process_utils.py:194} INFO - License:
   # [2024-07-24T00:08:36.172+0900] {process_utils.py:194} INFO - Location: 
$HOME/.local/lib/python3.10/site-packages
   # [2024-07-24T00:08:36.172+0900] {process_utils.py:194} INFO - Requires: ....
   
   ```
   
   </details>



##########
airflow/utils/process_utils.py:
##########
@@ -162,14 +162,17 @@ def signal_procs(sig):
     return returncodes
 
 
-def execute_in_subprocess(cmd: list[str], cwd: str | None = None) -> None:
+def execute_in_subprocess(cmd: list[str], cwd: str | None = None, env_vars: 
dict | None = None) -> None:
     """
     Execute a process and stream output to logger.
 
     :param cmd: command and arguments to run
     :param cwd: Current working directory passed to the Popen constructor
+    :param env_vars: Additional environment variables to set for the 
subprocess. If None,
+        the subprocess will inherit the current environment variables of the 
parent process
+        (``os.environ``)
     """
-    execute_in_subprocess_with_kwargs(cmd, cwd=cwd)
+    execute_in_subprocess_with_kwargs(cmd, cwd=cwd, env=dict(os.environ, 
**env_vars) if env_vars else None)

Review Comment:
   @uranusjr 
   As you suggested, it seems better to add it. 
   Simply assigning `ArgNotSet` to `env_vars` value may not be sufficient for 
handling all cases. 
   I will modify it by adding flags such as `inherit_env_vars` or 
`override_env_vars` with default True
   (until now, the code always fetched from `os.environ`)
   
   > I excluded cases you mentioned because I thought the 
`--system-site-package` option worked based on `PATH`. Ah, I just found out 
that it is controlled by site now :)
   
   <details> 
   
   <summary>Used sample codes</summary>
   
   ```python
   import os
   import sys
   
   print(sys.executable)
   ##### output
   # 
$HOME/.local/share/hatch/env/virtual/apache-airflow/4VQ7OEnp/airflow-310/bin/python
   
   from airflow.utils.process_utils import execute_in_subprocess_with_kwargs
   
   execute_in_subprocess_with_kwargs(
       ["bash", "-c", "env"], env={"MY_ENV": "VALUE"}
   )
   
   execute_in_subprocess_with_kwargs(
       ["bash", "-c", "echo $PATH"],
       env={}        # empty environment
   )
   # [2024-07-23T23:58:07.097+0900] {process_utils.py:190} INFO - Output:
   # [2024-07-23T23:58:07.098+0900] {process_utils.py:194} INFO - 
PWD=$HOME/projects/airflow-main
   # [2024-07-23T23:58:07.098+0900] {process_utils.py:194} INFO - MY_ENV=VALUE
   # [2024-07-23T23:58:07.098+0900] {process_utils.py:194} INFO - SHLVL=0
   
   # create venv
   execute_in_subprocess_with_kwargs(
       [sys.executable, "-m", "venv", "myenv", "--system-site-packages"]
   )
   
   # check site packages
   execute_in_subprocess_with_kwargs(
       [f"{os.getcwd()}/myenv/bin/python", "-c", "import site; 
print(site.getsitepackages())"],
       env={}      # empty environment
   )
   # [2024-07-23T23:58:08.020+0900] {process_utils.py:190} INFO - Output:
   # [2024-07-23T23:58:08.030+0900] {process_utils.py:194} INFO - [
   #   '$HOME/projects/airflow-main/myenv/lib/python3.10/site-packages',
   #   '$HOME/projects/airflow-main/myenv/local/lib/python3.10/dist-packages',
   #   '$HOME/projects/airflow-main/myenv/lib/python3/dist-packages',
   #   '$HOME/projects/airflow-main/myenv/lib/python3.10/dist-packages',
   #   '/usr/lib/python3.10/site-packages',
   #   '/usr/local/lib/python3.10/dist-packages',
   #   '/usr/lib/python3/dist-packages',
   #   '/usr/lib/python3.10/dist-packages'
   # ]
   
   # pip show apache-airflow
   execute_in_subprocess_with_kwargs(
       [f"{os.getcwd()}/myenv/bin/python", "-m", "pip", "show", 
"apache-airflow"],
       env={}      # empty environment
   )
   # [2024-07-24T00:08:34.668+0900] {process_utils.py:190} INFO - Output:
   # [2024-07-24T00:08:36.171+0900] {process_utils.py:194} INFO - Name: 
apache-airflow
   # [2024-07-24T00:08:36.171+0900] {process_utils.py:194} INFO - Version: 
2.10.0.dev0
   # [2024-07-24T00:08:36.171+0900] {process_utils.py:194} INFO - Summary: 
Programmatically author, schedule and monitor data pipelines
   # [2024-07-24T00:08:36.171+0900] {process_utils.py:194} INFO - Home-page:
   # [2024-07-24T00:08:36.172+0900] {process_utils.py:194} INFO - Author:
   # [2024-07-24T00:08:36.172+0900] {process_utils.py:194} INFO - Author-email: 
Apache Software Foundation <[email protected]>
   # [2024-07-24T00:08:36.172+0900] {process_utils.py:194} INFO - License:
   # [2024-07-24T00:08:36.172+0900] {process_utils.py:194} INFO - Location: 
$HOME/.local/lib/python3.10/site-packages
   # [2024-07-24T00:08:36.172+0900] {process_utils.py:194} INFO - Requires: ....
   
   ```
   
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to