gemini-code-assist[bot] commented on code in PR #35733: URL: https://github.com/apache/beam/pull/35733#discussion_r2241441058
########## sdks/python/scripts/run_pytest.sh: ########## @@ -41,17 +48,29 @@ # Initialize the user_marker variable. user_marker="" -# Isolate the user-provided -m argument (matching only). -if [[ $pytest_args =~ "-m" ]]; then - # Extract the marker value using the defined regex. - user_marker=$(echo "$pytest_args" | sed -nE "s/.*$marker_regex.*/\1/p") -fi +# Define regex pattern for quoted strings +quotes_regex="^[\"\'](.*)[\"\']$" Review Comment:  This regex for matching quoted strings is buggy because it can incorrectly match strings with mixed quotes (e.g., a string starting with a single quote and ending with a double quote). To fix this, you can use a backreference (`\1`) to ensure the closing quote is the same as the opening quote. Note that this changes the capture group for the content. I've added suggestions on other lines where this regex is used to adjust for this change. ```suggestion quotes_regex="^([\"'])(.*)\\1$" ``` ########## sdks/python/scripts/run_pytest.sh: ########## @@ -30,8 +30,15 @@ envname=${1?First argument required: suite base name} posargs=$2 pytest_args=$3 -# strip leading/trailing quotes from posargs because it can get double quoted as its passed through. -posargs=$(sed -e 's/^"//' -e 's/"$//' -e "s/'$//" -e "s/^'//" <<<$posargs) +# strip leading/trailing quotes from posargs because it can get double quoted as +# its passed through. +if [[ $posargs == \"*\" ]]; then + # If wrapped in double quotes, remove them + posargs="${posargs:1:${#posargs}-2}" +elif [[ $posargs == \'*\' ]]; then + # If wrapped in single quotes, remove them. + posargs="${posargs:1:${#posargs}-2}" +fi Review Comment:  The conditions to check for surrounding quotes are incorrect. The patterns `\"*\"` and `\'*\'` will be interpreted by `[[` as literal string comparisons against `*`, not as glob patterns that check for surrounding quotes. To correctly check if a string starts and ends with a specific character using globbing, the pattern should be unquoted inside `[[ ... ]]`. For literal quotes, you need to quote them appropriately for the shell. The corrected version will properly detect and strip surrounding quotes. ```suggestion if [[ $posargs == '"'*'"' ]]; then # If wrapped in double quotes, remove them posargs="${posargs:1:${#posargs}-2}" elif [[ $posargs == "'"*"'" ]]; then # If wrapped in single quotes, remove them. posargs="${posargs:1:${#posargs}-2}" fi ``` ########## sdks/python/scripts/run_pytest.sh: ########## @@ -41,17 +48,29 @@ # Initialize the user_marker variable. user_marker="" -# Isolate the user-provided -m argument (matching only). -if [[ $pytest_args =~ "-m" ]]; then - # Extract the marker value using the defined regex. - user_marker=$(echo "$pytest_args" | sed -nE "s/.*$marker_regex.*/\1/p") -fi +# Define regex pattern for quoted strings +quotes_regex="^[\"\'](.*)[\"\']$" -# Remove the -m argument from pytest_args (substitution only). -if [[ -n $user_marker ]]; then - pytest_args=$(echo "$pytest_args" | sed -E "s/$marker_regex//") +# Extract the user markers. +if [[ $posargs =~ $marker_regex ]]; then + # Get the full match including -m and the marker. + full_match="${BASH_REMATCH[0]}" + + # Get the marker with quotes (this is the first capture group). + quoted_marker="${BASH_REMATCH[1]}" + + # Remove any quotes around the marker. + if [[ $quoted_marker =~ $quotes_regex ]]; then + user_marker="${BASH_REMATCH[1]}" + else + user_marker="$quoted_marker" + fi Review Comment:  Following the suggested change to `quotes_regex`, the content of the quoted string is now in the second capture group. You should use `BASH_REMATCH[2]` to extract it. ```suggestion if [[ $quoted_marker =~ $quotes_regex ]]; then user_marker="${BASH_REMATCH[2]}" else user_marker="$quoted_marker" fi ``` ########## sdks/python/scripts/run_pytest.sh: ########## @@ -62,16 +81,60 @@ marker_for_sequential_tests="($user_marker) and ($marker_for_sequential_tests)" fi +# Parse posargs to separate pytest options from test paths. +options="" +test_paths="" + +# Safely split the posargs string into individual arguments. +eval "set -- $posargs" + +# Iterate through arguments. +while [[ $# -gt 0 ]]; do + arg="$1" + shift + + # If argument starts with dash, it's an option. + if [[ "$arg" == -* ]]; then + options+=" $arg" + + # Handle options that take a value (like -k). + if [[ "$arg" == "-k" && $# -gt 0 ]]; then + # Get the next argument. + next_arg="$1" + + # Check if it's quoted and remove quotes if needed. + if [[ $next_arg =~ $quotes_regex ]]; then + # Extract the content inside quotes. + next_arg="${BASH_REMATCH[1]}" + fi Review Comment:  Following the suggested change to `quotes_regex`, the content of the quoted string is now in the second capture group. You should use `BASH_REMATCH[2]` to extract it. ```suggestion if [[ $next_arg =~ $quotes_regex ]]; then # Extract the content inside quotes. next_arg="${BASH_REMATCH[2]}" fi ``` ########## sdks/python/scripts/run_pytest.sh: ########## @@ -62,16 +81,60 @@ marker_for_sequential_tests="($user_marker) and ($marker_for_sequential_tests)" fi +# Parse posargs to separate pytest options from test paths. +options="" +test_paths="" + +# Safely split the posargs string into individual arguments. +eval "set -- $posargs" Review Comment:  Using `eval` with user-controllable data (`$posargs`) is extremely dangerous and can lead to arbitrary command execution. For example, if a user passes `foo; rm -rf /` as part of the arguments, `eval` will execute the `rm` command. This is a critical security vulnerability. A safer alternative must be used to parse the arguments. The ideal solution would be to change the `tox.ini` file to pass arguments without quoting `"{posargs}"`, so they arrive as separate arguments in the script. If changing `tox.ini` is not an option, a safer parsing mechanism that does not use `eval` must be implemented here. ########## sdks/python/scripts/run_pytest.sh: ########## @@ -62,16 +81,60 @@ marker_for_sequential_tests="($user_marker) and ($marker_for_sequential_tests)" fi +# Parse posargs to separate pytest options from test paths. +options="" +test_paths="" + +# Safely split the posargs string into individual arguments. +eval "set -- $posargs" + +# Iterate through arguments. +while [[ $# -gt 0 ]]; do + arg="$1" + shift + + # If argument starts with dash, it's an option. + if [[ "$arg" == -* ]]; then + options+=" $arg" + + # Handle options that take a value (like -k). + if [[ "$arg" == "-k" && $# -gt 0 ]]; then + # Get the next argument. + next_arg="$1" + + # Check if it's quoted and remove quotes if needed. + if [[ $next_arg =~ $quotes_regex ]]; then + # Extract the content inside quotes. + next_arg="${BASH_REMATCH[1]}" + fi + + # Add the unquoted value to options. + options+=" $next_arg" + shift + fi Review Comment:  The argument parsing logic here specifically handles `-k` to unquote its value. If there are other pytest options that take a value and might be quoted (e.g., `--junit-prefix`), they will not be handled correctly. This makes the parsing logic brittle. Consider making this more generic to handle any option that takes a following argument, or at least add a comment to clarify that only `-k` is specially handled. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
