Re: [PR] feat(cache): warm up native filter option queries [superset]

via GitHub Mon, 29 Jun 2026 07:33:58 -0700


codeant-ai-for-open-source[bot] commented on code in PR #41531:
URL: https://github.com/apache/superset/pull/41531#discussion_r3492472440



##########
superset/tasks/native_filter_cache.py:
##########
@@ -0,0 +1,210 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+import logging
+from typing import Any
+
+from marshmallow import ValidationError
+
+from superset.charts.schemas import ChartDataQueryContextSchema
+from superset.common.chart_data import ChartDataResultFormat, 
ChartDataResultType
+from superset.common.query_context import QueryContext
+from superset.models.dashboard import Dashboard
+from superset.utils import json
+from superset.utils.core import DatasourceType
+
+logger = logging.getLogger(__name__)
+
+NATIVE_FILTER_DEFAULT_ROW_LIMIT = 1000
+NO_FILTER_TIME_RANGE = "No filter"
+
+
+def _get_filter_target(filter_config: dict[str, Any]) -> tuple[int | str, str] 
| None:
+    for target in filter_config.get("targets") or []:
+        if not isinstance(target, dict):
+            continue
+
+        dataset_id = target.get("datasetId")
+        column = target.get("column") or {}
+        column_name = column.get("name") if isinstance(column, dict) else None
+
+        if dataset_id is not None and column_name:
+            return dataset_id, column_name
+
+    return None
+
+
+def get_eligible_native_filters(dashboard: Dashboard) -> list[dict[str, Any]]:
+    """Return native filter value filters eligible for option cache warm-up."""
+    if not dashboard.json_metadata:
+        logger.warning(
+            "Dashboard %s has no json_metadata; skipping native filter 
warm-up",
+            dashboard.id,
+        )
+        return []
+
+    try:
+        metadata = json.loads(dashboard.json_metadata)
+    except (TypeError, ValueError):
+        logger.warning(
+            "Dashboard %s has malformed json_metadata; skipping native filter 
warm-up",
+            dashboard.id,
+        )
+        return []
+
+    native_filter_config = metadata.get("native_filter_configuration")
+    if not isinstance(native_filter_config, list):
+        logger.warning(
+            "Dashboard %s has no native_filter_configuration; skipping native "
+            "filter warm-up",
+            dashboard.id,
+        )
+        return []
+
+    print(
+        f"[DEBUG] Dashboard {dashboard.id} native_filter_configuration count: "
+        f"{len(native_filter_config)}",
+        flush=True,
+    )

Review Comment:
   **Suggestion:** This synchronous `print(..., flush=True)` runs for every 
dashboard warmup and forces immediate stdout flushes, which adds avoidable I/O 
overhead in a Celery task loop and can flood worker logs with internal filter 
metadata. Replace it with level-controlled structured logging (for example 
`logger.debug`) so normal runs avoid the extra blocking output. [performance]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Celery workers emit verbose debug logs every warmup run.
   - ⚠️ Warmup path pays extra synchronous stdout flushing cost.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Run the cache warmup job so `NativeFilterOptionsStrategy.get_tasks()` in
   `superset/tasks/cache.py:3-71` iterates over the configured dashboards and 
calls
   `get_eligible_native_filters(dashboard)` for each one.
   
   2. For a dashboard with valid `json_metadata` and a 
`native_filter_configuration` list,
   `get_eligible_native_filters()` in 
`superset/tasks/native_filter_cache.py:52-104` executes
   the `print(...)` at lines 79-83, emitting `[DEBUG] Dashboard <id>
   native_filter_configuration count: <n>` with `flush=True` on every 
invocation.
   
   3. After filtering to eligible native filters, the function prints another 
debug line at
   lines 98-102 (`"[DEBUG] Dashboard <id> eligible filters: [...]"`, also with 
`flush=True`),
   so each warmup run produces two flushed prints per dashboard even in normal 
operation.
   
   4. When `build_native_filter_option_query_context()` successfully builds a 
payload, it
   calls `print("[DEBUG] Built query context payload for datasource=<id> 
groupby=<column>",
   flush=True)` at lines 198-203 for every eligible filter; combined, these 
synchronous,
   unguarded `print(..., flush=True)` calls run in the Celery worker hot path 
and add extra
   stdout I/O and verbose, non-configurable logging for every dashboard/filter 
during cache
   warmup.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=9e3a7aa9071646daa5111c41a06e9670&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=9e3a7aa9071646daa5111c41a06e9670&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/tasks/native_filter_cache.py
   **Line:** 79:83
   **Comment:**
        *Performance: This synchronous `print(..., flush=True)` runs for every 
dashboard warmup and forces immediate stdout flushes, which adds avoidable I/O 
overhead in a Celery task loop and can flood worker logs with internal filter 
metadata. Replace it with level-controlled structured logging (for example 
`logger.debug`) so normal runs avoid the extra blocking output.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41531&comment_hash=d3d4e1d038fc0f78958720e017eb79578da239f607900b860125621a40823eb8&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41531&comment_hash=d3d4e1d038fc0f78958720e017eb79578da239f607900b860125621a40823eb8&reaction=dislike'>👎</a>



##########
superset/tasks/native_filter_cache.py:
##########
@@ -0,0 +1,210 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+import logging
+from typing import Any
+
+from marshmallow import ValidationError
+
+from superset.charts.schemas import ChartDataQueryContextSchema
+from superset.common.chart_data import ChartDataResultFormat, 
ChartDataResultType
+from superset.common.query_context import QueryContext
+from superset.models.dashboard import Dashboard
+from superset.utils import json
+from superset.utils.core import DatasourceType
+
+logger = logging.getLogger(__name__)
+
+NATIVE_FILTER_DEFAULT_ROW_LIMIT = 1000
+NO_FILTER_TIME_RANGE = "No filter"
+
+
+def _get_filter_target(filter_config: dict[str, Any]) -> tuple[int | str, str] 
| None:
+    for target in filter_config.get("targets") or []:
+        if not isinstance(target, dict):
+            continue
+
+        dataset_id = target.get("datasetId")
+        column = target.get("column") or {}
+        column_name = column.get("name") if isinstance(column, dict) else None
+
+        if dataset_id is not None and column_name:
+            return dataset_id, column_name
+
+    return None
+
+
+def get_eligible_native_filters(dashboard: Dashboard) -> list[dict[str, Any]]:
+    """Return native filter value filters eligible for option cache warm-up."""
+    if not dashboard.json_metadata:
+        logger.warning(
+            "Dashboard %s has no json_metadata; skipping native filter 
warm-up",
+            dashboard.id,
+        )
+        return []
+
+    try:
+        metadata = json.loads(dashboard.json_metadata)
+    except (TypeError, ValueError):
+        logger.warning(
+            "Dashboard %s has malformed json_metadata; skipping native filter 
warm-up",
+            dashboard.id,
+        )
+        return []
+
+    native_filter_config = metadata.get("native_filter_configuration")
+    if not isinstance(native_filter_config, list):
+        logger.warning(
+            "Dashboard %s has no native_filter_configuration; skipping native "
+            "filter warm-up",
+            dashboard.id,
+        )
+        return []
+
+    print(
+        f"[DEBUG] Dashboard {dashboard.id} native_filter_configuration count: "
+        f"{len(native_filter_config)}",
+        flush=True,
+    )
+
+    eligible_filters: list[dict[str, Any]] = []
+    for filter_config in native_filter_config:
+        if not isinstance(filter_config, dict):
+            continue
+        if filter_config.get("type") == "DIVIDER":
+            continue
+        if filter_config.get("filterType") != "filter_select":
+            continue
+        if _get_filter_target(filter_config) is None:
+            continue
+
+        eligible_filters.append(filter_config)
+
+    print(
+        f"[DEBUG] Dashboard {dashboard.id} eligible filters: "
+        f"{[f.get('id') for f in eligible_filters]}",
+        flush=True,
+    )
+
+    return eligible_filters
+
+
+def build_native_filter_option_form_data(
+    dashboard: Dashboard,
+    filter_config: dict[str, Any],
+) -> dict[str, Any] | None:
+    """Build form data for a native filter option query."""
+    target = _get_filter_target(filter_config)
+    if target is None:
+        logger.warning(
+            "Native filter %s on dashboard %s has no valid target; skipping",
+            filter_config.get("id"),
+            dashboard.id,
+        )
+        return None
+
+    dataset_id, column_name = target
+    control_values = filter_config.get("controlValues") or {}
+
+    return {
+        "datasource": f"{dataset_id}__table",
+        "viz_type": "filter_select",
+        "type": "NATIVE_FILTER",
+        "native_filter_id": filter_config["id"],
+        "dashboardId": dashboard.id,
+        "groupby": [column_name],
+        "adhoc_filters": filter_config.get("adhocFilters", []),
+        "extra_filters": [],
+        "extra_form_data": {},
+        "metrics": ["count"],
+        "row_limit": NATIVE_FILTER_DEFAULT_ROW_LIMIT,
+        "time_range": NO_FILTER_TIME_RANGE,
+        "granularity_sqla": None,
+        "showSearch": True,
+        "sortAscending": control_values.get("sortAscending", True),
+        "sortMetric": filter_config.get("sortMetric", None),
+    }
+
+
+def build_native_filter_option_query_context(
+    form_data: dict[str, Any],
+) -> QueryContext | None:
+    """Build a query context for a native filter option query."""
+    datasource = form_data.get("datasource")
+    groupby = form_data.get("groupby") or []
+    column_name = groupby[0] if groupby else None
+
+    if not datasource or not column_name:
+        logger.warning(
+            "Invalid native filter option form_data; datasource and groupby 
are "
+            "required"
+        )
+        return None
+
+    try:
+        datasource_id = int(str(datasource).split("__", 1)[0])
+    except (TypeError, ValueError):
+        logger.warning(
+            "Invalid native filter option datasource %r; skipping query 
context",
+            datasource,
+        )
+        return None
+
+    sort_metric = form_data.get("sortMetric")
+    sort_ascending = form_data.get("sortAscending", True)
+    metrics = [sort_metric] if sort_metric else []
+    orderby = [[sort_metric or column_name, bool(sort_ascending)]]
+
+    payload = {
+        "datasource": {
+            "id": datasource_id,
+            "type": DatasourceType.TABLE,
+        },
+        "force": False,
+        "queries": [
+            {
+                "groupby": [column_name],
+                "filters": form_data.get("adhoc_filters", []),

Review Comment:
   **Suggestion:** The query payload maps `adhoc_filters` into the `filters` 
field, but these are different schemas in chart-data requests. When native 
filter configs include adhoc clauses, schema loading can fail and this function 
returns `None`, so those filters are silently skipped and never warmed. Build 
the payload with `adhoc_filters` (or convert adhoc clauses to simple filter 
clauses before assigning to `filters`) to match the API contract used by normal 
native-filter requests. [api mismatch]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Native filter options not cached when adhocFilters present.
   - ⚠️ Warmup logs show filters skipped due to validation errors.
   - ⚠️ Affected dashboards issue fresh DB queries for options.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Trigger cache warmup so `NativeFilterOptionsStrategy.get_tasks()` runs in
   `superset/tasks/cache.py` (lines 3-71 in the snippet), which iterates 
dashboards and calls
   `get_eligible_native_filters(dashboard)` followed by
   `build_native_filter_option_form_data(...)` and
   `build_native_filter_option_query_context(form_data)`.
   
   2. Configure a dashboard whose `json_metadata` contains a 
`native_filter_configuration`
   entry with `filterType: "filter_select"` and an `"adhocFilters"` array using 
the standard
   adhoc filter shape (objects with `expressionType`, `subject`, `operator`, 
`comparator`,
   etc.), which is how adhoc filters are processed elsewhere (see
   `superset/jinja_context.py:450-529`, where `form_data["adhoc_filters"]` 
entries use
   `expressionType`, `clause`, `subject`, `operator`, `comparator` and are 
converted into
   `{col, op, val}` filters).
   
   3. During warmup, `build_native_filter_option_form_data()` in
   `superset/tasks/native_filter_cache.py:107-141` copies those adhoc filter 
objects directly
   into `form_data["adhoc_filters"] = filter_config.get("adhocFilters", [])`, 
and
   `build_native_filter_option_query_context()` then builds the payload with 
`"filters":
   form_data.get("adhoc_filters", [])` at line 182, passing that payload to
   `ChartDataQueryContextSchema().load(payload)` (lines 173-205).
   
   4. `ChartDataQueryContextSchema` validates `queries` against 
`ChartDataQueryObjectSchema`,
   whose `filters` field is a list of `ChartDataFilterSchema` requiring `col`, 
`op`, and
   `val` (see `superset/charts/schemas.py:219-255`), so the adhoc filter 
objects lacking
   `col`/`op`/`val` cause a `ValidationError`; 
`build_native_filter_option_query_context()`
   catches this at lines 205-210, logs "Unable to build native filter option 
query context:
   <error>", returns `None`, and `NativeFilterOptionsStrategy.get_tasks()` 
treats this as a
   skipped filter (see `superset/tasks/cache.py:32-37`), leaving that native 
filter’s option
   query un-warmed in the cache.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=e69afd289eb24a46942ae52ef0240a4b&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=e69afd289eb24a46942ae52ef0240a4b&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/tasks/native_filter_cache.py
   **Line:** 182:182
   **Comment:**
        *Api Mismatch: The query payload maps `adhoc_filters` into the 
`filters` field, but these are different schemas in chart-data requests. When 
native filter configs include adhoc clauses, schema loading can fail and this 
function returns `None`, so those filters are silently skipped and never 
warmed. Build the payload with `adhoc_filters` (or convert adhoc clauses to 
simple filter clauses before assigning to `filters`) to match the API contract 
used by normal native-filter requests.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41531&comment_hash=2aada1de7c6059933f11e2828c3250cd3ab69df1e9a827cd3c5e0699e42712ab&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41531&comment_hash=2aada1de7c6059933f11e2828c3250cd3ab69df1e9a827cd3c5e0699e42712ab&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat(cache): warm up native filter option queries [superset]

Reply via email to