scripts: fix posargs for `run_pytest.sh` [beam]

via GitHub Tue, 29 Jul 2025 20:16:02 -0700


gemini-code-assist[bot] commented on code in PR #35733:
URL: https://github.com/apache/beam/pull/35733#discussion_r2241438532



##########
sdks/python/scripts/run_pytest.sh:
##########
@@ -41,17 +48,29 @@
 # Initialize the user_marker variable.
 user_marker=""
 
-# Isolate the user-provided -m argument (matching only).
-if [[ $pytest_args =~ "-m" ]]; then
-  # Extract the marker value using the defined regex.
-  user_marker=$(echo "$pytest_args" | sed -nE "s/.*$marker_regex.*/\1/p")
-fi
+# Define regex pattern for quoted strings
+quotes_regex="^[\"\'](.*)[\"\']$"
 
-# Remove the -m argument from pytest_args (substitution only).
-if [[ -n $user_marker ]]; then
-  pytest_args=$(echo "$pytest_args" | sed -E "s/$marker_regex//")
+# Extract the user markers.
+if [[ $posargs =~ $marker_regex ]]; then
+  # Get the full match including -m and the marker.
+  full_match="${BASH_REMATCH[0]}"
+
+  # Get the marker with quotes (this is the first capture group).
+  quoted_marker="${BASH_REMATCH[1]}"
+
+  # Remove any quotes around the marker.
+  if [[ $quoted_marker =~ $quotes_regex ]]; then
+    user_marker="${BASH_REMATCH[1]}"
+  else
+    user_marker="$quoted_marker"
+  fi

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   The `quotes_regex` defined on line 52 and used here is not strict enough. It 
will incorrectly handle strings with mismatched quotes. For example, it would 
treat a string like `"'foo\""` as being quoted and extract `foo`, which is 
likely not the intended behavior.
   
   This same logic is used on line 106 when parsing options and should also be 
corrected there.
   
   A more robust approach is to check for matching pairs of quotes explicitly.
   
   ```suggestion
     if [[ $quoted_marker =~ ^\"(.*)\"$ ]]; then
       user_marker="${BASH_REMATCH[1]}"
     elif [[ $quoted_marker =~ ^\'(.*)\'$ ]]; then
       user_marker="${BASH_REMATCH[1]}"
     else
       user_marker="$quoted_marker"
     fi
   ```



##########
sdks/python/scripts/run_pytest.sh:
##########
@@ -62,16 +81,60 @@
   marker_for_sequential_tests="($user_marker) and 
($marker_for_sequential_tests)"
 fi
 
+# Parse posargs to separate pytest options from test paths.
+options=""
+test_paths=""
+
+# Safely split the posargs string into individual arguments.
+eval "set -- $posargs"
+
+# Iterate through arguments.
+while [[ $# -gt 0 ]]; do
+  arg="$1"
+  shift
+
+  # If argument starts with dash, it's an option.
+  if [[ "$arg" == -* ]]; then
+    options+=" $arg"
+
+    # Handle options that take a value (like -k).
+    if [[ "$arg" == "-k" && $# -gt 0 ]]; then
+      # Get the next argument.
+      next_arg="$1"
+
+      # Check if it's quoted and remove quotes if needed.
+      if [[ $next_arg =~ $quotes_regex ]]; then
+        # Extract the content inside quotes.
+        next_arg="${BASH_REMATCH[1]}"
+      fi
+
+      # Add the unquoted value to options.
+      options+=" $next_arg"
+      shift
+    fi
+  else
+    # Otherwise it's a test path.
+    test_paths+=" $arg"
+  fi
+done
+
+# Construct the final pytest command arguments.
+pyargs_section=""
+if [[ -n "$test_paths" ]]; then
+  pyargs_section="--pyargs $test_paths"
+fi
+pytest_command_args="$options $pyargs_section"
+
 # Run tests in parallel.
-echo "Running parallel tests with: pytest -m \"$marker_for_parallel_tests\" 
$pytest_args"
+echo "Running parallel tests with: pytest -m \"$marker_for_parallel_tests\" 
$posargs"

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   This `echo` statement prints the original `$posargs` string. However, the 
actual `pytest` command on the next line uses the processed 
`${pytest_command_args}`. This can be misleading when debugging, as the logged 
command does not accurately reflect what is being executed.
   
   To improve clarity, the `echo` statement should print the arguments that are 
actually passed to `pytest`.
   
   A similar issue exists for the sequential tests' `echo` statement on line 
135.
   
   ```suggestion
   echo "Running parallel tests with: pytest -m \"$marker_for_parallel_tests\" 
$pytest_args $pytest_command_args"
   ```



##########
sdks/python/scripts/run_pytest.sh:
##########
@@ -62,16 +81,60 @@
   marker_for_sequential_tests="($user_marker) and 
($marker_for_sequential_tests)"
 fi
 
+# Parse posargs to separate pytest options from test paths.
+options=""
+test_paths=""
+
+# Safely split the posargs string into individual arguments.
+eval "set -- $posargs"
+
+# Iterate through arguments.
+while [[ $# -gt 0 ]]; do
+  arg="$1"
+  shift
+
+  # If argument starts with dash, it's an option.
+  if [[ "$arg" == -* ]]; then
+    options+=" $arg"
+
+    # Handle options that take a value (like -k).
+    if [[ "$arg" == "-k" && $# -gt 0 ]]; then
+      # Get the next argument.
+      next_arg="$1"
+
+      # Check if it's quoted and remove quotes if needed.
+      if [[ $next_arg =~ $quotes_regex ]]; then
+        # Extract the content inside quotes.
+        next_arg="${BASH_REMATCH[1]}"
+      fi
+
+      # Add the unquoted value to options.
+      options+=" $next_arg"
+      shift
+    fi
+  else
+    # Otherwise it's a test path.
+    test_paths+=" $arg"
+  fi
+done

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   The argument parsing logic here is brittle because it only has special 
handling for the `-k` option. It will incorrectly parse other common `pytest` 
options that take a value.
   
   For example, if `posargs` contains `-n 4` (to specify the number of parallel 
workers for `pytest-xdist`), this loop will treat `-n` as an option but `4` as 
a test path. This will result in an incorrect final command like `... -n 
--pyargs 4`.
   
   To make this more robust, the logic should be generalized to handle any 
option that takes an argument, not just `-k`. A common way to do this is to 
check if the argument following an option starts with a `-`. If it does not, it 
can be treated as a value for that option.



##########
sdks/python/scripts/run_pytest.sh:
##########
@@ -62,16 +81,60 @@ if [[ -n $user_marker ]]; then
   marker_for_sequential_tests="($user_marker) and 
($marker_for_sequential_tests)"
 fi
 
+# Parse posargs to separate pytest options from test paths.
+options=""
+test_paths=""
+
+# Safely split the posargs string into individual arguments.
+eval "set -- $posargs"

Review Comment:
   ![critical](https://www.gstatic.com/codereviewagent/critical.svg)
   
   The use of `eval` on `$posargs` introduces a critical command injection 
vulnerability. Since `$posargs` can be controlled by the user invoking `tox`, a 
malicious string could lead to arbitrary command execution on the system 
running the script.
   
   For example, a user could pass `'; rm -rf ~'` as part of the arguments, and 
`eval` would execute the `rm` command.
   
   Please replace `eval` with a safer method for parsing the arguments. While 
parsing shell arguments with quotes is a complex problem, using `eval` on 
untrusted input is not secure. Consider if the arguments can be passed from 
`tox.ini` in a way that they are already available as a shell array, which 
would avoid this parsing issue entirely.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] sdks/python/scripts: fix posargs for `run_pytest.sh` [beam]

Reply via email to