Re: [PR] fix(semantic layers): coerce filter types [superset]

via GitHub Mon, 18 May 2026 11:37:20 -0700


codeant-ai-for-open-source[bot] commented on code in PR #40222:
URL: https://github.com/apache/superset/pull/40222#discussion_r3261183218



##########
superset/semantic_layers/mapper.py:
##########
@@ -588,6 +590,125 @@ def _convert_query_object_filter(
     }
 
 
+def _coerce_filter_value(
+    value: FilterValues | frozenset[FilterValues],
+    dimension: Dimension,
+) -> FilterValues | frozenset[FilterValues]:
+    if isinstance(value, frozenset):
+        return frozenset(_coerce_scalar_filter_value(v, dimension) for v in 
value)
+    return _coerce_scalar_filter_value(value, dimension)
+
+
+def _coerce_scalar_filter_value(  # noqa: C901
+    value: FilterValues, dimension: Dimension
+) -> FilterValues:
+    if value is None:
+        return None
+
+    dtype = dimension.type
+
+    if pa.types.is_boolean(dtype):
+        if isinstance(value, bool):
+            return value
+        if isinstance(value, str):
+            parsed = value.strip().lower()
+            if parsed in {"true", "t", "1", "yes", "y", "on"}:
+                return True
+            if parsed in {"false", "f", "0", "no", "n", "off"}:
+                return False
+        raise ValueError(
+            f"Invalid boolean value {value!r} for filter column 
{dimension.name}"
+        )

Review Comment:
   **Suggestion:** Boolean coercion currently rejects numeric values like `1` 
and `0`, which are valid serialized boolean inputs in Superset query payloads. 
This will raise `ValueError` for otherwise valid filters instead of coercing 
them. Accept numeric booleans (at least `0`/`1`) to avoid breaking existing 
filter contracts. [api mismatch]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Semantic boolean filters error on numeric 0/1 inputs.
   - ⚠️ Explore semantic views may reject existing boolean filters.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. A simple filter dict with a numeric value is built in the Explore/Jinja 
pipeline as
   `{"op": op, "col": column, "val": val}` (see 
`superset/jinja_context.py:447-482` where
   `val` is not coerced and can be any JSON value, including numeric 0/1).
   
   2. When a semantic view is used, this filter is carried into a 
`ValidatedQueryObject`
   (subclass of `QueryObject` at `superset/semantic_layers/mapper.py:81-93`), 
where
   `filter_["val"]` is read in `_convert_query_object_filter()` 
(`mapper.py:531-538`) and
   stored in `value: FilterValues | frozenset[FilterValues]`.
   
   3. For a boolean dimension (Arrow type `pa.bool_()`), 
`_get_filters_from_query_object()`
   (`mapper.py:340-388`) calls `_convert_query_object_filter()` 
(`mapper.py:515-590`), which
   then calls `_coerce_filter_value(value, dimension)` at `mapper.py:593-600`, 
passing the
   numeric `value` (e.g., `1`).
   
   4. `_coerce_filter_value()` delegates to `_coerce_scalar_filter_value()`
   (`mapper.py:602-709`); in the boolean branch at `mapper.py:610-621`, the 
code accepts only
   `bool` or string representations. A numeric `1`/`0` is neither `bool` nor 
`str`, so it
   falls through and raises `ValueError("Invalid boolean value ...")`, causing 
semantic-layer
   queries that send 0/1 for boolean filters to fail instead of being coerced 
like the string
   `"1"`/`"0"`. Current unit tests explicitly exercise this function for 
booleans
   (`test_coerce_boolean` and `test_coerce_boolean_invalid_raises` in
   `superset/tests/unit_tests/semantic_layers/mapper_test.py:14-40`), but they 
treat plain
   integer `1` as invalid, so supporting numeric booleans would require 
changing both
   implementation and tests.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Fsemantic_layers%2Fmapper.py%0A%2A%2ALine%3A%2A%2A%20610%3A621%0A%2A%2AComment%3A%2A%2A%0A%09%2AApi%20Mismatch%3A%20Boolean%20coercion%20currently%20rejects%20numeric%20values%20like%20%601%60%20and%20%600%60%2C%20which%20are%20valid%20serialized%20boolean%20inputs%20in%20Superset%20query%20payloads.%20This%20will%20raise%20%60ValueError%60%20for%20otherwise%20valid%20filters%20instead%20of%20coercing%20them.%20Accept%20numeric%20booleans%20%28at%20least%20%600%60%2F%601%60%29%20to%20avoid%20breaking%20existing%20filter%20contracts.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20
 
same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Fsemantic_layers%2Fmapper.py%0A%2A%2ALine%3A%2A%2A%20610%3A621%0A%2A%2AComment%3A%2A%2A%0A%09%2AApi%20Mismatch%3A%20Boolean%20coercion%20currently%20rejects%20numeric%20values%20like%20%601%60%20and%20%600%60%2C%20which%20are%20valid%20serialized%20boolean%20inputs%20in%20Superset%20query%20payloads.%20This%20will%20raise%20%60ValueError%60%20for%20otherwise%20valid%20filters%20instead%20of%20coercing%20them.%20Accept%20numeric%20booleans%20%28at%20least%20%600%60%2F%601%60%29%20to%20avoid%20breaking%20existing%20filter%20contracts.%0A%0AValidate%20the%20corr
 
ectness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/semantic_layers/mapper.py
   **Line:** 610:621
   **Comment:**
        *Api Mismatch: Boolean coercion currently rejects numeric values like 
`1` and `0`, which are valid serialized boolean inputs in Superset query 
payloads. This will raise `ValueError` for otherwise valid filters instead of 
coercing them. Accept numeric booleans (at least `0`/`1`) to avoid breaking 
existing filter contracts.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40222&comment_hash=8c142d0df7b309c39774b85ecf392c9ea7065e11fe60fbf11b2d53dcefa99210&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40222&comment_hash=8c142d0df7b309c39774b85ecf392c9ea7065e11fe60fbf11b2d53dcefa99210&reaction=dislike'>👎</a>



##########
superset/semantic_layers/mapper.py:
##########
@@ -588,6 +590,125 @@ def _convert_query_object_filter(
     }
 
 
+def _coerce_filter_value(
+    value: FilterValues | frozenset[FilterValues],
+    dimension: Dimension,
+) -> FilterValues | frozenset[FilterValues]:
+    if isinstance(value, frozenset):
+        return frozenset(_coerce_scalar_filter_value(v, dimension) for v in 
value)
+    return _coerce_scalar_filter_value(value, dimension)
+
+
+def _coerce_scalar_filter_value(  # noqa: C901
+    value: FilterValues, dimension: Dimension
+) -> FilterValues:
+    if value is None:
+        return None
+
+    dtype = dimension.type
+
+    if pa.types.is_boolean(dtype):
+        if isinstance(value, bool):
+            return value
+        if isinstance(value, str):
+            parsed = value.strip().lower()
+            if parsed in {"true", "t", "1", "yes", "y", "on"}:
+                return True
+            if parsed in {"false", "f", "0", "no", "n", "off"}:
+                return False
+        raise ValueError(
+            f"Invalid boolean value {value!r} for filter column 
{dimension.name}"
+        )
+
+    if pa.types.is_integer(dtype):
+        if isinstance(value, bool):
+            raise ValueError(
+                f"Invalid integer value {value!r} for filter column 
{dimension.name}"
+            )
+        if isinstance(value, int):
+            return value
+        if isinstance(value, str):
+            try:
+                return int(value.strip())
+            except ValueError as ex:
+                raise ValueError(
+                    f"Invalid integer value {value!r} for filter column "
+                    f"{dimension.name}"
+                ) from ex
+        raise ValueError(
+            f"Invalid integer value {value!r} for filter column 
{dimension.name}"
+        )
+
+    if pa.types.is_floating(dtype) or pa.types.is_decimal(dtype):
+        if isinstance(value, bool):
+            raise ValueError(
+                f"Invalid numeric value {value!r} for filter column 
{dimension.name}"
+            )
+        if isinstance(value, (int, float)):
+            return float(value)
+        if isinstance(value, str):
+            try:
+                return float(value.strip())
+            except ValueError as ex:
+                raise ValueError(
+                    f"Invalid numeric value {value!r} for filter column "
+                    f"{dimension.name}"
+                ) from ex
+        raise ValueError(
+            f"Invalid numeric value {value!r} for filter column 
{dimension.name}"
+        )

Review Comment:
   **Suggestion:** Decimal filters are coerced through `float`, which 
introduces precision loss for high-precision values and can cause incorrect 
equality/range comparisons. Keep decimal values in a lossless representation 
(for example, `Decimal` or validated string) instead of converting decimal 
inputs to binary floating point. [type error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Decimal semantic filters may mis-select rows due to rounding.
   - ⚠️ Numeric Explore filters on decimals risk precision loss.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Define a semantic dimension with a decimal Arrow type, such as 
`Dimension(...,
   type=pa.decimal128(10, 2), ...)`, matching the pattern in tests helper 
`_dim()` at
   `superset/tests/unit_tests/semantic_layers/mapper_test.py:1-2` and the use 
of decimal
   types in `get_column_type()` (`superset/semantic_layers/models.py:19-36`, 
where
   `pa.types.is_decimal` is treated as numeric).
   
   2. From the front-end or a report, send an Explore query with a filter on 
this decimal
   dimension, where `val` is a high-precision decimal string (e.g., 
`"12345.67890123"`). This
   shape matches how filters are built as dicts `{"col": column, "op": op, 
"val": val}` in
   `superset/jinja_context.py:447-482` and passed through into 
`QueryObject.filter`.
   
   3. In the semantic mapper, `_get_filters_from_query_object()`
   (`superset/semantic_layers/mapper.py:340-388`) calls 
`_convert_query_object_filter()`
   (`mapper.py:515-590`), which reads `filter_["val"]` into `value` at 
`mapper.py:531-538`
   and then calls `_coerce_filter_value(value, dimension)` 
(`mapper.py:593-600`) with the
   decimal `Dimension`.
   
   4. `_coerce_filter_value()` delegates to `_coerce_scalar_filter_value()`
   (`mapper.py:602-709`); for a decimal dtype, execution enters the branch `if
   pa.types.is_floating(dtype) or pa.types.is_decimal(dtype):` at 
`mapper.py:642-659`. String
   inputs are converted via `float(value.strip())`, so a high-precision decimal 
like
   `"12345.67890123"` is coerced to a binary float with possible rounding. The 
resulting
   `Filter.value` is a `float` (as also asserted in 
`test_coerce_floating_or_decimal` in
   `superset/tests/unit_tests/semantic_layers/mapper_test.py:58-66`), which is 
then consumed
   by semantic view implementations to build queries. For equality or tight 
range predicates
   on high-precision decimals, this float conversion can misrepresent the 
original value and
   cause rows to be incorrectly included or excluded.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Fsemantic_layers%2Fmapper.py%0A%2A%2ALine%3A%2A%2A%20642%3A659%0A%2A%2AComment%3A%2A%2A%0A%09%2AType%20Error%3A%20Decimal%20filters%20are%20coerced%20through%20%60float%60%2C%20which%20introduces%20precision%20loss%20for%20high-precision%20values%20and%20can%20cause%20incorrect%20equality%2Frange%20comparisons.%20Keep%20decimal%20values%20in%20a%20lossless%20representation%20%28for%20example%2C%20%60Decimal%60%20or%20validated%20string%29%20instead%20of%20converting%20decimal%20inputs%20to%20binary%20floating%20point.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20
 
and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Fsemantic_layers%2Fmapper.py%0A%2A%2ALine%3A%2A%2A%20642%3A659%0A%2A%2AComment%3A%2A%2A%0A%09%2AType%20Error%3A%20Decimal%20filters%20are%20coerced%20through%20%60float%60%2C%20which%20introduces%20precision%20loss%20for%20high-precision%20values%20and%20can%20cause%20incorrect%20equality%2Frange%20comparisons.%20Keep%20decimal%20values%20in%20a%20lossless%20representation%20%28for%20example%2C%20%60Decimal%60%20or%20validated%20string%29%20instead%20of%20converting%20decimal%20inputs%20to%20binary%20floating%20point.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%2
 
0issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/semantic_layers/mapper.py
   **Line:** 642:659
   **Comment:**
        *Type Error: Decimal filters are coerced through `float`, which 
introduces precision loss for high-precision values and can cause incorrect 
equality/range comparisons. Keep decimal values in a lossless representation 
(for example, `Decimal` or validated string) instead of converting decimal 
inputs to binary floating point.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40222&comment_hash=8f313074d630369a66bfd0b0c9d49b8da32b2a8e483e5c27a246ff87c6dcc383&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40222&comment_hash=8f313074d630369a66bfd0b0c9d49b8da32b2a8e483e5c27a246ff87c6dcc383&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(semantic layers): coerce filter types [superset]

Reply via email to