codeant-ai-for-open-source[bot] commented on code in PR #36983:
URL: https://github.com/apache/superset/pull/36983#discussion_r2672543444
##########
tests/unit_tests/utils/date_parser_tests.py:
##########
@@ -611,3 +611,48 @@ def test_date_range_migration() -> None:
field = "10 years ago"
assert not re.search(DateRangeMigration.x_dateunit, field)
+
+
+# Tests for bounded whitespace regex patterns in time_range_lookup
[email protected](
+ "time_range,should_parse",
+ [
+ ("last 7 days : ", True),
+ ("this week : ", True),
+ ("start of next month : ", True),
+ ("prior quarter : ", True),
+ ("last 7 days : ", True),
+ ("last 7 days : ", True),
+ ("last 7 days : ", True),
+ ("last 7 days : ", True),
+ ("last 7 days : ", False), # 6 spaces - exceeds bound
+ ("start of next month : ", True), # 5 spaces - valid
+ ("last week : ", True),
+ ("last week : ", True),
+ ("last week : ", True),
+ ("next 12 months : ", True),
+ ("next 12 months : ", True),
+ ("next 12 months : ", True),
+ ("last7days : ", False), # no space after scope - invalid
+ ("last 7days : ", True), # \s{0,5} allows 0 spaces after number -
valid
+ ("lastweek : ", False), # no space after scope - invalid
+ ("last" + " " * 100 + "7 days : ", False),
Review Comment:
**Suggestion:** Unclear/overly-large test input: the parametrized case using
string concatenation to build a 100-space string is noisy and hides the intent
(the test aims to check >5 spaces). Replace the 100-space expression with an
explicit 6-space example to be precise and maintainable. [possible bug]
**Severity Level:** Critical 🚨
```suggestion
("last 7 days : ", False), # 6 spaces - exceeds bound
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
Replacing the noisy 100-space concatenation with a concise explicit example
(e.g. 6 spaces) improves readability and maintainability. There's already an
explicit 6-space test in the list, so collapsing the redundant long-case to a
clear representative is sensible and safe.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** tests/unit_tests/utils/date_parser_tests.py
**Line:** 639:639
**Comment:**
*Possible Bug: Unclear/overly-large test input: the parametrized case
using string concatenation to build a 100-space string is noisy and hides the
intent (the test aims to check >5 spaces). Replace the 100-space expression
with an explicit 6-space example to be precise and maintainable.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/utils/date_parser.py:
##########
@@ -418,9 +418,9 @@ def get_since_until( # pylint:
disable=too-many-arguments,too-many-locals,too-m
if time_range and separator in time_range:
time_range_lookup = [
(
- r"^(start of|beginning of|end of)\s+"
- r"(this|last|next|prior)\s+"
- r"([0-9]+)?\s*"
+ r"^(start of|beginning of|end of)\s{1,5}"
+ r"(this|last|next|prior)\s{1,5}"
+ r"([0-9]+)?\s{0,5}"
Review Comment:
**Suggestion:** Regression in whitespace handling: the first regex bounds
whitespace to at most 5 characters (using `\s{1,5}` and `\s{0,5}`), which is
more restrictive than the prior behavior (`\s+`/`\s*`) and will fail to match
valid inputs containing larger or variable whitespace sequences; restore the
more permissive quantifiers to preserve expected matching. [logic error]
**Severity Level:** Minor ⚠️
```suggestion
r"^(start of|beginning of|end of)\s+"
r"(this|last|next|prior)\s+"
r"([0-9]+)?\s*"
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
Changing \s+ and \s* to bounded \s{1,5}/\s{0,5} is a regression in
permissiveness:
inputs with unusual spacing (or tabs) could fail to match. Restoring \s+/\s*
preserves previous behavior and is a safe correction.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/utils/date_parser.py
**Line:** 421:423
**Comment:**
*Logic Error: Regression in whitespace handling: the first regex bounds
whitespace to at most 5 characters (using `\s{1,5}` and `\s{0,5}`), which is
more restrictive than the prior behavior (`\s+`/`\s*`) and will fail to match
valid inputs containing larger or variable whitespace sequences; restore the
more permissive quantifiers to preserve expected matching.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/utils/date_parser.py:
##########
@@ -431,8 +431,8 @@ def get_since_until( # pylint:
disable=too-many-arguments,too-many-locals,too-m
),
),
(
- r"^(this|last|next|prior)\s+"
- r"([0-9]+)?\s*"
+ r"^(this|last|next|prior)\s{1,5}"
+ r"([0-9]+)?\s{0,5}"
r"(second|minute|day|week|month|quarter|year)s?$", # Matches
"next 5 days" or "last 2 weeks" # noqa: E501
Review Comment:
**Suggestion:** Logic bug: the second regex pattern that matches scopes like
"next 5 hours" omits the `hour` unit, so expressions containing "hour"/"hours"
will not match and will fall back to the default parsing path, causing
incorrect parsing or failure for time ranges expressed in hours. [logic error]
**Severity Level:** Minor ⚠️
```suggestion
r"(second|minute|hour|day|week|month|quarter|year)s?$", #
Matches "next 5 days" or "last 2 weeks" # noqa: E501
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
This is a correct and actionable bug report: the current regex omits
"hour"/"hours"
so inputs like "last 3 hours" won't match this branch and will fall back to
less
appropriate parsing. Adding "hour" to the alternation fixes the logic without
side effects.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/utils/date_parser.py
**Line:** 436:436
**Comment:**
*Logic Error: Logic bug: the second regex pattern that matches scopes
like "next 5 hours" omits the `hour` unit, so expressions containing
"hour"/"hours" will not match and will fall back to the default parsing path,
causing incorrect parsing or failure for time ranges expressed in hours.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]