gemini-code-assist[bot] commented on code in PR #35733:
URL: https://github.com/apache/beam/pull/35733#discussion_r2241438532
##########
sdks/python/scripts/run_pytest.sh:
##########
@@ -41,17 +48,29 @@
# Initialize the user_marker variable.
user_marker=""
-# Isolate the user-provided -m argument (matching only).
-if [[ $pytest_args =~ "-m" ]]; then
- # Extract the marker value using the defined regex.
- user_marker=$(echo "$pytest_args" | sed -nE "s/.*$marker_regex.*/\1/p")
-fi
+# Define regex pattern for quoted strings
+quotes_regex="^[\"\'](.*)[\"\']$"
-# Remove the -m argument from pytest_args (substitution only).
-if [[ -n $user_marker ]]; then
- pytest_args=$(echo "$pytest_args" | sed -E "s/$marker_regex//")
+# Extract the user markers.
+if [[ $posargs =~ $marker_regex ]]; then
+ # Get the full match including -m and the marker.
+ full_match="${BASH_REMATCH[0]}"
+
+ # Get the marker with quotes (this is the first capture group).
+ quoted_marker="${BASH_REMATCH[1]}"
+
+ # Remove any quotes around the marker.
+ if [[ $quoted_marker =~ $quotes_regex ]]; then
+ user_marker="${BASH_REMATCH[1]}"
+ else
+ user_marker="$quoted_marker"
+ fi
Review Comment:

The `quotes_regex` defined on line 52 and used here is not strict enough. It
will incorrectly handle strings with mismatched quotes. For example, it would
treat a string like `"'foo\""` as being quoted and extract `foo`, which is
likely not the intended behavior.
This same logic is used on line 106 when parsing options and should also be
corrected there.
A more robust approach is to check for matching pairs of quotes explicitly.
```suggestion
if [[ $quoted_marker =~ ^\"(.*)\"$ ]]; then
user_marker="${BASH_REMATCH[1]}"
elif [[ $quoted_marker =~ ^\'(.*)\'$ ]]; then
user_marker="${BASH_REMATCH[1]}"
else
user_marker="$quoted_marker"
fi
```
##########
sdks/python/scripts/run_pytest.sh:
##########
@@ -62,16 +81,60 @@
marker_for_sequential_tests="($user_marker) and
($marker_for_sequential_tests)"
fi
+# Parse posargs to separate pytest options from test paths.
+options=""
+test_paths=""
+
+# Safely split the posargs string into individual arguments.
+eval "set -- $posargs"
+
+# Iterate through arguments.
+while [[ $# -gt 0 ]]; do
+ arg="$1"
+ shift
+
+ # If argument starts with dash, it's an option.
+ if [[ "$arg" == -* ]]; then
+ options+=" $arg"
+
+ # Handle options that take a value (like -k).
+ if [[ "$arg" == "-k" && $# -gt 0 ]]; then
+ # Get the next argument.
+ next_arg="$1"
+
+ # Check if it's quoted and remove quotes if needed.
+ if [[ $next_arg =~ $quotes_regex ]]; then
+ # Extract the content inside quotes.
+ next_arg="${BASH_REMATCH[1]}"
+ fi
+
+ # Add the unquoted value to options.
+ options+=" $next_arg"
+ shift
+ fi
+ else
+ # Otherwise it's a test path.
+ test_paths+=" $arg"
+ fi
+done
+
+# Construct the final pytest command arguments.
+pyargs_section=""
+if [[ -n "$test_paths" ]]; then
+ pyargs_section="--pyargs $test_paths"
+fi
+pytest_command_args="$options $pyargs_section"
+
# Run tests in parallel.
-echo "Running parallel tests with: pytest -m \"$marker_for_parallel_tests\"
$pytest_args"
+echo "Running parallel tests with: pytest -m \"$marker_for_parallel_tests\"
$posargs"
Review Comment:

This `echo` statement prints the original `$posargs` string. However, the
actual `pytest` command on the next line uses the processed
`${pytest_command_args}`. This can be misleading when debugging, as the logged
command does not accurately reflect what is being executed.
To improve clarity, the `echo` statement should print the arguments that are
actually passed to `pytest`.
A similar issue exists for the sequential tests' `echo` statement on line
135.
```suggestion
echo "Running parallel tests with: pytest -m \"$marker_for_parallel_tests\"
$pytest_args $pytest_command_args"
```
##########
sdks/python/scripts/run_pytest.sh:
##########
@@ -62,16 +81,60 @@
marker_for_sequential_tests="($user_marker) and
($marker_for_sequential_tests)"
fi
+# Parse posargs to separate pytest options from test paths.
+options=""
+test_paths=""
+
+# Safely split the posargs string into individual arguments.
+eval "set -- $posargs"
+
+# Iterate through arguments.
+while [[ $# -gt 0 ]]; do
+ arg="$1"
+ shift
+
+ # If argument starts with dash, it's an option.
+ if [[ "$arg" == -* ]]; then
+ options+=" $arg"
+
+ # Handle options that take a value (like -k).
+ if [[ "$arg" == "-k" && $# -gt 0 ]]; then
+ # Get the next argument.
+ next_arg="$1"
+
+ # Check if it's quoted and remove quotes if needed.
+ if [[ $next_arg =~ $quotes_regex ]]; then
+ # Extract the content inside quotes.
+ next_arg="${BASH_REMATCH[1]}"
+ fi
+
+ # Add the unquoted value to options.
+ options+=" $next_arg"
+ shift
+ fi
+ else
+ # Otherwise it's a test path.
+ test_paths+=" $arg"
+ fi
+done
Review Comment:

The argument parsing logic here is brittle because it only has special
handling for the `-k` option. It will incorrectly parse other common `pytest`
options that take a value.
For example, if `posargs` contains `-n 4` (to specify the number of parallel
workers for `pytest-xdist`), this loop will treat `-n` as an option but `4` as
a test path. This will result in an incorrect final command like `... -n
--pyargs 4`.
To make this more robust, the logic should be generalized to handle any
option that takes an argument, not just `-k`. A common way to do this is to
check if the argument following an option starts with a `-`. If it does not, it
can be treated as a value for that option.
##########
sdks/python/scripts/run_pytest.sh:
##########
@@ -62,16 +81,60 @@ if [[ -n $user_marker ]]; then
marker_for_sequential_tests="($user_marker) and
($marker_for_sequential_tests)"
fi
+# Parse posargs to separate pytest options from test paths.
+options=""
+test_paths=""
+
+# Safely split the posargs string into individual arguments.
+eval "set -- $posargs"
Review Comment:

The use of `eval` on `$posargs` introduces a critical command injection
vulnerability. Since `$posargs` can be controlled by the user invoking `tox`, a
malicious string could lead to arbitrary command execution on the system
running the script.
For example, a user could pass `'; rm -rf ~'` as part of the arguments, and
`eval` would execute the `rm` command.
Please replace `eval` with a safer method for parsing the arguments. While
parsing shell arguments with quotes is a complex problem, using `eval` on
untrusted input is not secure. Consider if the arguments can be passed from
`tox.ini` in a way that they are already available as a shell array, which
would avoid this parsing issue entirely.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]