scripts: fix posargs for `run_pytest.sh` [beam]

via GitHub Tue, 29 Jul 2025 20:19:16 -0700


gemini-code-assist[bot] commented on code in PR #35733:
URL: https://github.com/apache/beam/pull/35733#discussion_r2241441058



##########
sdks/python/scripts/run_pytest.sh:
##########
@@ -41,17 +48,29 @@
 # Initialize the user_marker variable.
 user_marker=""
 
-# Isolate the user-provided -m argument (matching only).
-if [[ $pytest_args =~ "-m" ]]; then
-  # Extract the marker value using the defined regex.
-  user_marker=$(echo "$pytest_args" | sed -nE "s/.*$marker_regex.*/\1/p")
-fi
+# Define regex pattern for quoted strings
+quotes_regex="^[\"\'](.*)[\"\']$"

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   This regex for matching quoted strings is buggy because it can incorrectly 
match strings with mixed quotes (e.g., a string starting with a single quote 
and ending with a double quote).
   
   To fix this, you can use a backreference (`\1`) to ensure the closing quote 
is the same as the opening quote. Note that this changes the capture group for 
the content. I've added suggestions on other lines where this regex is used to 
adjust for this change.
   
   ```suggestion
   quotes_regex="^([\"'])(.*)\\1$"
   ```



##########
sdks/python/scripts/run_pytest.sh:
##########
@@ -30,8 +30,15 @@ envname=${1?First argument required: suite base name}
 posargs=$2
 pytest_args=$3
 
-# strip leading/trailing quotes from posargs because it can get double quoted 
as its passed through.
-posargs=$(sed -e 's/^"//' -e 's/"$//' -e "s/'$//" -e "s/^'//" <<<$posargs)
+# strip leading/trailing quotes from posargs because it can get double quoted 
as
+# its passed through.
+if [[ $posargs == \"*\" ]]; then
+  # If wrapped in double quotes, remove them
+  posargs="${posargs:1:${#posargs}-2}"
+elif [[ $posargs == \'*\' ]]; then
+  # If wrapped in single quotes, remove them.
+  posargs="${posargs:1:${#posargs}-2}"
+fi

Review Comment:
   ![critical](https://www.gstatic.com/codereviewagent/critical.svg)
   
   The conditions to check for surrounding quotes are incorrect. The patterns 
`\"*\"` and `\'*\'` will be interpreted by `[[` as literal string comparisons 
against `*`, not as glob patterns that check for surrounding quotes.
   
   To correctly check if a string starts and ends with a specific character 
using globbing, the pattern should be unquoted inside `[[ ... ]]`. For literal 
quotes, you need to quote them appropriately for the shell.
   
   The corrected version will properly detect and strip surrounding quotes.
   
   ```suggestion
   if [[ $posargs == '"'*'"' ]]; then
     # If wrapped in double quotes, remove them
     posargs="${posargs:1:${#posargs}-2}"
   elif [[ $posargs == "'"*"'" ]]; then
     # If wrapped in single quotes, remove them.
     posargs="${posargs:1:${#posargs}-2}"
   fi
   ```



##########
sdks/python/scripts/run_pytest.sh:
##########
@@ -41,17 +48,29 @@
 # Initialize the user_marker variable.
 user_marker=""
 
-# Isolate the user-provided -m argument (matching only).
-if [[ $pytest_args =~ "-m" ]]; then
-  # Extract the marker value using the defined regex.
-  user_marker=$(echo "$pytest_args" | sed -nE "s/.*$marker_regex.*/\1/p")
-fi
+# Define regex pattern for quoted strings
+quotes_regex="^[\"\'](.*)[\"\']$"
 
-# Remove the -m argument from pytest_args (substitution only).
-if [[ -n $user_marker ]]; then
-  pytest_args=$(echo "$pytest_args" | sed -E "s/$marker_regex//")
+# Extract the user markers.
+if [[ $posargs =~ $marker_regex ]]; then
+  # Get the full match including -m and the marker.
+  full_match="${BASH_REMATCH[0]}"
+
+  # Get the marker with quotes (this is the first capture group).
+  quoted_marker="${BASH_REMATCH[1]}"
+
+  # Remove any quotes around the marker.
+  if [[ $quoted_marker =~ $quotes_regex ]]; then
+    user_marker="${BASH_REMATCH[1]}"
+  else
+    user_marker="$quoted_marker"
+  fi

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   Following the suggested change to `quotes_regex`, the content of the quoted 
string is now in the second capture group. You should use `BASH_REMATCH[2]` to 
extract it.
   
   ```suggestion
     if [[ $quoted_marker =~ $quotes_regex ]]; then
       user_marker="${BASH_REMATCH[2]}"
     else
       user_marker="$quoted_marker"
     fi
   ```



##########
sdks/python/scripts/run_pytest.sh:
##########
@@ -62,16 +81,60 @@
   marker_for_sequential_tests="($user_marker) and 
($marker_for_sequential_tests)"
 fi
 
+# Parse posargs to separate pytest options from test paths.
+options=""
+test_paths=""
+
+# Safely split the posargs string into individual arguments.
+eval "set -- $posargs"
+
+# Iterate through arguments.
+while [[ $# -gt 0 ]]; do
+  arg="$1"
+  shift
+
+  # If argument starts with dash, it's an option.
+  if [[ "$arg" == -* ]]; then
+    options+=" $arg"
+
+    # Handle options that take a value (like -k).
+    if [[ "$arg" == "-k" && $# -gt 0 ]]; then
+      # Get the next argument.
+      next_arg="$1"
+
+      # Check if it's quoted and remove quotes if needed.
+      if [[ $next_arg =~ $quotes_regex ]]; then
+        # Extract the content inside quotes.
+        next_arg="${BASH_REMATCH[1]}"
+      fi

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   Following the suggested change to `quotes_regex`, the content of the quoted 
string is now in the second capture group. You should use `BASH_REMATCH[2]` to 
extract it.
   
   ```suggestion
         if [[ $next_arg =~ $quotes_regex ]]; then
           # Extract the content inside quotes.
           next_arg="${BASH_REMATCH[2]}"
         fi
   ```



##########
sdks/python/scripts/run_pytest.sh:
##########
@@ -62,16 +81,60 @@
   marker_for_sequential_tests="($user_marker) and 
($marker_for_sequential_tests)"
 fi
 
+# Parse posargs to separate pytest options from test paths.
+options=""
+test_paths=""
+
+# Safely split the posargs string into individual arguments.
+eval "set -- $posargs"

Review Comment:
   ![critical](https://www.gstatic.com/codereviewagent/critical.svg)
   
   Using `eval` with user-controllable data (`$posargs`) is extremely dangerous 
and can lead to arbitrary command execution. For example, if a user passes 
`foo; rm -rf /` as part of the arguments, `eval` will execute the `rm` command. 
This is a critical security vulnerability.
   
   A safer alternative must be used to parse the arguments. The ideal solution 
would be to change the `tox.ini` file to pass arguments without quoting 
`"{posargs}"`, so they arrive as separate arguments in the script. If changing 
`tox.ini` is not an option, a safer parsing mechanism that does not use `eval` 
must be implemented here.



##########
sdks/python/scripts/run_pytest.sh:
##########
@@ -62,16 +81,60 @@
   marker_for_sequential_tests="($user_marker) and 
($marker_for_sequential_tests)"
 fi
 
+# Parse posargs to separate pytest options from test paths.
+options=""
+test_paths=""
+
+# Safely split the posargs string into individual arguments.
+eval "set -- $posargs"
+
+# Iterate through arguments.
+while [[ $# -gt 0 ]]; do
+  arg="$1"
+  shift
+
+  # If argument starts with dash, it's an option.
+  if [[ "$arg" == -* ]]; then
+    options+=" $arg"
+
+    # Handle options that take a value (like -k).
+    if [[ "$arg" == "-k" && $# -gt 0 ]]; then
+      # Get the next argument.
+      next_arg="$1"
+
+      # Check if it's quoted and remove quotes if needed.
+      if [[ $next_arg =~ $quotes_regex ]]; then
+        # Extract the content inside quotes.
+        next_arg="${BASH_REMATCH[1]}"
+      fi
+
+      # Add the unquoted value to options.
+      options+=" $next_arg"
+      shift
+    fi

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The argument parsing logic here specifically handles `-k` to unquote its 
value. If there are other pytest options that take a value and might be quoted 
(e.g., `--junit-prefix`), they will not be handled correctly. This makes the 
parsing logic brittle.
   
   Consider making this more generic to handle any option that takes a 
following argument, or at least add a comment to clarify that only `-k` is 
specially handled.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] sdks/python/scripts: fix posargs for `run_pytest.sh` [beam]

Reply via email to