codeant-ai-for-open-source[bot] commented on code in PR #40222:
URL: https://github.com/apache/superset/pull/40222#discussion_r3261183218
##########
superset/semantic_layers/mapper.py:
##########
@@ -588,6 +590,125 @@ def _convert_query_object_filter(
}
+def _coerce_filter_value(
+ value: FilterValues | frozenset[FilterValues],
+ dimension: Dimension,
+) -> FilterValues | frozenset[FilterValues]:
+ if isinstance(value, frozenset):
+ return frozenset(_coerce_scalar_filter_value(v, dimension) for v in
value)
+ return _coerce_scalar_filter_value(value, dimension)
+
+
+def _coerce_scalar_filter_value( # noqa: C901
+ value: FilterValues, dimension: Dimension
+) -> FilterValues:
+ if value is None:
+ return None
+
+ dtype = dimension.type
+
+ if pa.types.is_boolean(dtype):
+ if isinstance(value, bool):
+ return value
+ if isinstance(value, str):
+ parsed = value.strip().lower()
+ if parsed in {"true", "t", "1", "yes", "y", "on"}:
+ return True
+ if parsed in {"false", "f", "0", "no", "n", "off"}:
+ return False
+ raise ValueError(
+ f"Invalid boolean value {value!r} for filter column
{dimension.name}"
+ )
Review Comment:
**Suggestion:** Boolean coercion currently rejects numeric values like `1`
and `0`, which are valid serialized boolean inputs in Superset query payloads.
This will raise `ValueError` for otherwise valid filters instead of coercing
them. Accept numeric booleans (at least `0`/`1`) to avoid breaking existing
filter contracts. [api mismatch]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ Semantic boolean filters error on numeric 0/1 inputs.
- ⚠️ Explore semantic views may reject existing boolean filters.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. A simple filter dict with a numeric value is built in the Explore/Jinja
pipeline as
`{"op": op, "col": column, "val": val}` (see
`superset/jinja_context.py:447-482` where
`val` is not coerced and can be any JSON value, including numeric 0/1).
2. When a semantic view is used, this filter is carried into a
`ValidatedQueryObject`
(subclass of `QueryObject` at `superset/semantic_layers/mapper.py:81-93`),
where
`filter_["val"]` is read in `_convert_query_object_filter()`
(`mapper.py:531-538`) and
stored in `value: FilterValues | frozenset[FilterValues]`.
3. For a boolean dimension (Arrow type `pa.bool_()`),
`_get_filters_from_query_object()`
(`mapper.py:340-388`) calls `_convert_query_object_filter()`
(`mapper.py:515-590`), which
then calls `_coerce_filter_value(value, dimension)` at `mapper.py:593-600`,
passing the
numeric `value` (e.g., `1`).
4. `_coerce_filter_value()` delegates to `_coerce_scalar_filter_value()`
(`mapper.py:602-709`); in the boolean branch at `mapper.py:610-621`, the
code accepts only
`bool` or string representations. A numeric `1`/`0` is neither `bool` nor
`str`, so it
falls through and raises `ValueError("Invalid boolean value ...")`, causing
semantic-layer
queries that send 0/1 for boolean filters to fail instead of being coerced
like the string
`"1"`/`"0"`. Current unit tests explicitly exercise this function for
booleans
(`test_coerce_boolean` and `test_coerce_boolean_invalid_raises` in
`superset/tests/unit_tests/semantic_layers/mapper_test.py:14-40`), but they
treat plain
integer `1` as invalid, so supporting numeric booleans would require
changing both
implementation and tests.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Fsemantic_layers%2Fmapper.py%0A%2A%2ALine%3A%2A%2A%20610%3A621%0A%2A%2AComment%3A%2A%2A%0A%09%2AApi%20Mismatch%3A%20Boolean%20coercion%20currently%20rejects%20numeric%20values%20like%20%601%60%20and%20%600%60%2C%20which%20are%20valid%20serialized%20boolean%20inputs%20in%20Superset%20query%20payloads.%20This%20will%20raise%20%60ValueError%60%20for%20otherwise%20valid%20filters%20instead%20of%20coercing%20them.%20Accept%20numeric%20booleans%20%28at%20least%20%600%60%2F%601%60%29%20to%20avoid%20breaking%20existing%20filter%20contracts.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20
same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Fsemantic_layers%2Fmapper.py%0A%2A%2ALine%3A%2A%2A%20610%3A621%0A%2A%2AComment%3A%2A%2A%0A%09%2AApi%20Mismatch%3A%20Boolean%20coercion%20currently%20rejects%20numeric%20values%20like%20%601%60%20and%20%600%60%2C%20which%20are%20valid%20serialized%20boolean%20inputs%20in%20Superset%20query%20payloads.%20This%20will%20raise%20%60ValueError%60%20for%20otherwise%20valid%20filters%20instead%20of%20coercing%20them.%20Accept%20numeric%20booleans%20%28at%20least%20%600%60%2F%601%60%29%20to%20avoid%20breaking%20existing%20filter%20contracts.%0A%0AValidate%20the%20corr
ectness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/semantic_layers/mapper.py
**Line:** 610:621
**Comment:**
*Api Mismatch: Boolean coercion currently rejects numeric values like
`1` and `0`, which are valid serialized boolean inputs in Superset query
payloads. This will raise `ValueError` for otherwise valid filters instead of
coercing them. Accept numeric booleans (at least `0`/`1`) to avoid breaking
existing filter contracts.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40222&comment_hash=8c142d0df7b309c39774b85ecf392c9ea7065e11fe60fbf11b2d53dcefa99210&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40222&comment_hash=8c142d0df7b309c39774b85ecf392c9ea7065e11fe60fbf11b2d53dcefa99210&reaction=dislike'>👎</a>
##########
superset/semantic_layers/mapper.py:
##########
@@ -588,6 +590,125 @@ def _convert_query_object_filter(
}
+def _coerce_filter_value(
+ value: FilterValues | frozenset[FilterValues],
+ dimension: Dimension,
+) -> FilterValues | frozenset[FilterValues]:
+ if isinstance(value, frozenset):
+ return frozenset(_coerce_scalar_filter_value(v, dimension) for v in
value)
+ return _coerce_scalar_filter_value(value, dimension)
+
+
+def _coerce_scalar_filter_value( # noqa: C901
+ value: FilterValues, dimension: Dimension
+) -> FilterValues:
+ if value is None:
+ return None
+
+ dtype = dimension.type
+
+ if pa.types.is_boolean(dtype):
+ if isinstance(value, bool):
+ return value
+ if isinstance(value, str):
+ parsed = value.strip().lower()
+ if parsed in {"true", "t", "1", "yes", "y", "on"}:
+ return True
+ if parsed in {"false", "f", "0", "no", "n", "off"}:
+ return False
+ raise ValueError(
+ f"Invalid boolean value {value!r} for filter column
{dimension.name}"
+ )
+
+ if pa.types.is_integer(dtype):
+ if isinstance(value, bool):
+ raise ValueError(
+ f"Invalid integer value {value!r} for filter column
{dimension.name}"
+ )
+ if isinstance(value, int):
+ return value
+ if isinstance(value, str):
+ try:
+ return int(value.strip())
+ except ValueError as ex:
+ raise ValueError(
+ f"Invalid integer value {value!r} for filter column "
+ f"{dimension.name}"
+ ) from ex
+ raise ValueError(
+ f"Invalid integer value {value!r} for filter column
{dimension.name}"
+ )
+
+ if pa.types.is_floating(dtype) or pa.types.is_decimal(dtype):
+ if isinstance(value, bool):
+ raise ValueError(
+ f"Invalid numeric value {value!r} for filter column
{dimension.name}"
+ )
+ if isinstance(value, (int, float)):
+ return float(value)
+ if isinstance(value, str):
+ try:
+ return float(value.strip())
+ except ValueError as ex:
+ raise ValueError(
+ f"Invalid numeric value {value!r} for filter column "
+ f"{dimension.name}"
+ ) from ex
+ raise ValueError(
+ f"Invalid numeric value {value!r} for filter column
{dimension.name}"
+ )
Review Comment:
**Suggestion:** Decimal filters are coerced through `float`, which
introduces precision loss for high-precision values and can cause incorrect
equality/range comparisons. Keep decimal values in a lossless representation
(for example, `Decimal` or validated string) instead of converting decimal
inputs to binary floating point. [type error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ Decimal semantic filters may mis-select rows due to rounding.
- ⚠️ Numeric Explore filters on decimals risk precision loss.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Define a semantic dimension with a decimal Arrow type, such as
`Dimension(...,
type=pa.decimal128(10, 2), ...)`, matching the pattern in tests helper
`_dim()` at
`superset/tests/unit_tests/semantic_layers/mapper_test.py:1-2` and the use
of decimal
types in `get_column_type()` (`superset/semantic_layers/models.py:19-36`,
where
`pa.types.is_decimal` is treated as numeric).
2. From the front-end or a report, send an Explore query with a filter on
this decimal
dimension, where `val` is a high-precision decimal string (e.g.,
`"12345.67890123"`). This
shape matches how filters are built as dicts `{"col": column, "op": op,
"val": val}` in
`superset/jinja_context.py:447-482` and passed through into
`QueryObject.filter`.
3. In the semantic mapper, `_get_filters_from_query_object()`
(`superset/semantic_layers/mapper.py:340-388`) calls
`_convert_query_object_filter()`
(`mapper.py:515-590`), which reads `filter_["val"]` into `value` at
`mapper.py:531-538`
and then calls `_coerce_filter_value(value, dimension)`
(`mapper.py:593-600`) with the
decimal `Dimension`.
4. `_coerce_filter_value()` delegates to `_coerce_scalar_filter_value()`
(`mapper.py:602-709`); for a decimal dtype, execution enters the branch `if
pa.types.is_floating(dtype) or pa.types.is_decimal(dtype):` at
`mapper.py:642-659`. String
inputs are converted via `float(value.strip())`, so a high-precision decimal
like
`"12345.67890123"` is coerced to a binary float with possible rounding. The
resulting
`Filter.value` is a `float` (as also asserted in
`test_coerce_floating_or_decimal` in
`superset/tests/unit_tests/semantic_layers/mapper_test.py:58-66`), which is
then consumed
by semantic view implementations to build queries. For equality or tight
range predicates
on high-precision decimals, this float conversion can misrepresent the
original value and
cause rows to be incorrectly included or excluded.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Fsemantic_layers%2Fmapper.py%0A%2A%2ALine%3A%2A%2A%20642%3A659%0A%2A%2AComment%3A%2A%2A%0A%09%2AType%20Error%3A%20Decimal%20filters%20are%20coerced%20through%20%60float%60%2C%20which%20introduces%20precision%20loss%20for%20high-precision%20values%20and%20can%20cause%20incorrect%20equality%2Frange%20comparisons.%20Keep%20decimal%20values%20in%20a%20lossless%20representation%20%28for%20example%2C%20%60Decimal%60%20or%20validated%20string%29%20instead%20of%20converting%20decimal%20inputs%20to%20binary%20floating%20point.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20
and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Fsemantic_layers%2Fmapper.py%0A%2A%2ALine%3A%2A%2A%20642%3A659%0A%2A%2AComment%3A%2A%2A%0A%09%2AType%20Error%3A%20Decimal%20filters%20are%20coerced%20through%20%60float%60%2C%20which%20introduces%20precision%20loss%20for%20high-precision%20values%20and%20can%20cause%20incorrect%20equality%2Frange%20comparisons.%20Keep%20decimal%20values%20in%20a%20lossless%20representation%20%28for%20example%2C%20%60Decimal%60%20or%20validated%20string%29%20instead%20of%20converting%20decimal%20inputs%20to%20binary%20floating%20point.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%2
0issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/semantic_layers/mapper.py
**Line:** 642:659
**Comment:**
*Type Error: Decimal filters are coerced through `float`, which
introduces precision loss for high-precision values and can cause incorrect
equality/range comparisons. Keep decimal values in a lossless representation
(for example, `Decimal` or validated string) instead of converting decimal
inputs to binary floating point.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40222&comment_hash=8f313074d630369a66bfd0b0c9d49b8da32b2a8e483e5c27a246ff87c6dcc383&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40222&comment_hash=8f313074d630369a66bfd0b0c9d49b8da32b2a8e483e5c27a246ff87c6dcc383&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]