Re: [PR] feat(lineage): add lineage visualization across datasets, charts and dashboards [superset]

via GitHub Tue, 09 Jun 2026 10:35:11 -0700


codeant-ai-for-open-source[bot] commented on code in PR #40912:
URL: https://github.com/apache/superset/pull/40912#discussion_r3382658487



##########
superset/dashboards/api.py:
##########
@@ -524,6 +525,114 @@ def get(
         )
         return self.response(200, result=result)
 
+    @expose("/<id_or_slug>/lineage", methods=("GET",))
+    @protect()
+    @safe
+    @statsd_metrics
+    @with_dashboard
+    @event_logger.log_this_with_context(
+        action=lambda self, *args, **kwargs: 
f"{self.__class__.__name__}.lineage",
+        log_to_statsd=False,
+    )
+    # pylint: disable=arguments-differ,arguments-renamed
+    def lineage(self, dash: Dashboard) -> Response:
+        """Get lineage information for a dashboard.
+        ---
+        get:
+          summary: Get lineage information for a dashboard
+          description: >-
+            Returns upstream (charts, datasets, databases) lineage information
+            for a dashboard
+          parameters:
+          - in: path
+            name: id_or_slug
+            schema:
+              type: string
+            description: Either the id of the dashboard, or its slug
+          responses:
+            200:
+              description: Lineage information
+              content:
+                application/json:
+                  schema:
+                    $ref: "#/components/schemas/DashboardLineageResponseSchema"

Review Comment:
   **Suggestion:** The new endpoint references `DashboardLineageResponseSchema` 
in OpenAPI docs, but this schema is not registered in 
`openapi_spec_component_schemas` for `DashboardRestApi`, producing an 
unresolved `$ref` in generated API specs. Import and add the schema to 
component schemas to keep the API contract valid. [api mismatch]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ OpenAPI spec has unresolved DashboardLineage schema reference.
   - ⚠️ Client codegen may fail for dashboard lineage endpoint.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. In `superset/dashboards/api.py:60-80`, inspect the docstring for
   `DashboardRestApi.lineage`. The 200 response schema uses an OpenAPI `$ref` to
   `#/components/schemas/DashboardLineageResponseSchema` (line 558 in the PR 
hunk).
   
   2. Locate the definition of `DashboardLineageResponseSchema` in
   `superset/dashboards/schemas.py:614-617`, where it is declared as a 
Marshmallow schema
   composing `DashboardLineageDashboardSchema`, 
`DashboardLineageUpstreamSchema`, and a
   `downstream` field.
   
   3. Examine `DashboardRestApi`'s OpenAPI configuration in
   `superset/dashboards/api.py:426-438`. The tuple 
`openapi_spec_component_schemas` includes
   `ChartEntityResponseSchema`, `DashboardCacheScreenshotResponseSchema`,
   `DashboardCopySchema`, `DashboardGetResponseSchema`, 
`DashboardDatasetSchema`,
   `TabsPayloadSchema`, `GetFavStarIdsSchema`, 
`EmbeddedDashboardResponseSchema`, and
   `DashboardScreenshotPostSchema`, but does not include 
`DashboardLineageResponseSchema`.
   
   4. Compare this to `DatasetRestApi` in `superset/datasets/api.py:295-305`, 
where
   `openapi_spec_component_schemas` explicitly includes 
`DatasetLineageResponseSchema` to
   back the `$ref` used in its lineage docstring (lines 76–81 of its doc). When 
Superset's
   OpenAPI generator assembles the spec using `openapi_spec_component_schemas`, 
the
   unresolved `$ref` to `DashboardLineageResponseSchema` in the dashboard 
lineage endpoint
   will not match any registered schema component, producing an invalid or 
partially resolved
   OpenAPI document for `/api/v1/dashboard/<id_or_slug>/lineage`.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=1ff67ae62ac24af094d823a1f59327b8&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=1ff67ae62ac24af094d823a1f59327b8&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/dashboards/api.py
   **Line:** 558:558
   **Comment:**
        *Api Mismatch: The new endpoint references 
`DashboardLineageResponseSchema` in OpenAPI docs, but this schema is not 
registered in `openapi_spec_component_schemas` for `DashboardRestApi`, 
producing an unresolved `$ref` in generated API specs. Import and add the 
schema to component schemas to keep the API contract valid.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=b3c1b3f6b44d0fab1e17284e2e4fa86a0fe8e8bd2813dd94dd4c84c296457c33&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=b3c1b3f6b44d0fab1e17284e2e4fa86a0fe8e8bd2813dd94dd4c84c296457c33&reaction=dislike'>👎</a>



##########
superset/dashboards/api.py:
##########
@@ -524,6 +525,114 @@ def get(
         )
         return self.response(200, result=result)
 
+    @expose("/<id_or_slug>/lineage", methods=("GET",))
+    @protect()
+    @safe
+    @statsd_metrics
+    @with_dashboard
+    @event_logger.log_this_with_context(
+        action=lambda self, *args, **kwargs: 
f"{self.__class__.__name__}.lineage",
+        log_to_statsd=False,
+    )
+    # pylint: disable=arguments-differ,arguments-renamed
+    def lineage(self, dash: Dashboard) -> Response:
+        """Get lineage information for a dashboard.
+        ---
+        get:
+          summary: Get lineage information for a dashboard
+          description: >-
+            Returns upstream (charts, datasets, databases) lineage information
+            for a dashboard
+          parameters:
+          - in: path
+            name: id_or_slug
+            schema:
+              type: string
+            description: Either the id of the dashboard, or its slug
+          responses:
+            200:
+              description: Lineage information
+              content:
+                application/json:
+                  schema:
+                    $ref: "#/components/schemas/DashboardLineageResponseSchema"
+            401:
+              $ref: '#/components/responses/401'
+            404:
+              $ref: '#/components/responses/404'
+            500:
+              $ref: '#/components/responses/500'
+        """
+        dashboard_info = {
+            "id": dash.id,
+            "title": dash.dashboard_title,
+            "slug": dash.slug,
+            "published": dash.published,
+        }
+
+        # Get upstream (charts, datasets, databases) information
+        charts = []
+        dataset_map = {}
+        database_map = {}
+
+        for chart in dash.slices:
+            charts.append(
+                {
+                    "id": chart.id,
+                    "slice_name": chart.slice_name,
+                    "viz_type": chart.viz_type,
+                    "dataset_id": chart.datasource_id,
+                }
+            )
+
+            # Collect dataset information
+            dataset = chart.datasource
+            if dataset and dataset.id not in dataset_map:
+                dataset_map[dataset.id] = {
+                    "id": dataset.id,
+                    "name": dataset.name,
+                    "database_id": dataset.database_id,
+                    "database_name": dataset.database.database_name
+                    if dataset.database
+                    else None,
+                    "schema": dataset.schema,
+                    "table_name": dataset.table_name,
+                    "chart_ids": [],

Review Comment:
   **Suggestion:** Dataset/database metadata is added to lineage for every 
chart in the dashboard without checking datasource access, while other 
dashboard endpoints explicitly redact datasource details when access is 
missing. This can leak schema/table/database metadata to users who can open the 
dashboard but are not allowed to inspect datasource internals. Gate these 
fields behind `can_access_datasource` (or redact sensitive fields). [security]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Dashboard lineage API leaks restricted datasource metadata.
   - ⚠️ Lineage UI reveals schema and table names.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Observe the dashboard lineage endpoint definition at
   `superset/dashboards/api.py:49-59`, where `DashboardRestApi.lineage` is 
exposed as `GET
   /api/v1/dashboard/<id_or_slug>/lineage` with `@protect()` and 
`@with_dashboard` but no
   datasource-level access checks inside the method body.
   
   2. Inspect the implementation of `lineage` in 
`superset/dashboards/api.py:87-151`. For
   each chart in `dash.slices` (loop starting at line 99), the code assigns 
`dataset =
   chart.datasource` and, when `dataset` is truthy and not yet in 
`dataset_map`, stores
   metadata including `database_id`, `database_name`, `schema`, and 
`table_name` (lines
   590–600 in the PR hunk).
   
   3. Compare this to `_serialize_dashboard_dataset` at 
`superset/dashboards/api.py:210-220`,
   which explicitly calls `security_manager.can_access_datasource(datasource)` 
and redacts
   several fields when the current user lacks datasource access. The lineage 
method does not
   call `security_manager.can_access_datasource` at all, so it never redacts 
dataset/database
   metadata.
   
   4. In a deployment where a user has access to a dashboard but only partial 
access to its
   datasources (per-object checks implemented in 
`security_manager.can_access_datasource` at
   `superset/security/manager.py:833-838` and `can_access_dashboard` around
   `superset/security/manager.py:3272`), calling `GET 
/api/v1/dashboard/<id_or_slug>/lineage`
   returns `upstream.datasets.result[*]` entries containing `database_name`, 
`schema`, and
   `table_name` for every `chart.datasource`, including those datasources where
   `security_manager.can_access_datasource` would return `False`, leaking 
underlying
   schema/table/database metadata to unauthorized viewers.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=6839f69023f64713a8f6d552b9396e5d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=6839f69023f64713a8f6d552b9396e5d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/dashboards/api.py
   **Line:** 590:600
   **Comment:**
        *Security: Dataset/database metadata is added to lineage for every 
chart in the dashboard without checking datasource access, while other 
dashboard endpoints explicitly redact datasource details when access is 
missing. This can leak schema/table/database metadata to users who can open the 
dashboard but are not allowed to inspect datasource internals. Gate these 
fields behind `can_access_datasource` (or redact sensitive fields).
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=21a6fa1b3c3ed7ce4a6c1baa075184d0decd470250a0c00d694e06e5d2f12ca8&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=21a6fa1b3c3ed7ce4a6c1baa075184d0decd470250a0c00d694e06e5d2f12ca8&reaction=dislike'>👎</a>



##########
superset/datasets/api.py:
##########
@@ -846,6 +849,119 @@ def related_objects(self, id_or_uuid: str) -> Response:
             dashboards={"count": len(dashboards), "result": dashboards},
         )
 
+    @expose("/<id_or_uuid>/lineage", methods=("GET",))
+    @protect()
+    @safe
+    @statsd_metrics
+    @event_logger.log_this_with_context(
+        action=lambda self, *args, **kwargs: 
f"{self.__class__.__name__}.lineage",
+        log_to_statsd=False,
+    )
+    def lineage(self, id_or_uuid: str) -> Response:
+        """Get lineage information for a dataset.
+        ---
+        get:
+          summary: Get lineage information for a dataset
+          description: >-
+            Returns upstream (database) and downstream (charts, dashboards) 
lineage
+            information for a dataset
+          parameters:
+          - in: path
+            name: id_or_uuid
+            schema:
+              type: string
+            description: Either the id of the dataset, or its uuid
+          responses:
+            200:
+              description: Lineage information
+              content:
+                application/json:
+                  schema:
+                    $ref: "#/components/schemas/DatasetLineageResponseSchema"
+            401:
+              $ref: '#/components/responses/401'
+            404:
+              $ref: '#/components/responses/404'
+            500:
+              $ref: '#/components/responses/500'
+        """
+        dataset = DatasetDAO.find_by_id_or_uuid(id_or_uuid)
+        if not dataset:
+            return self.response_404()
+
+        dataset_info = {
+            "id": dataset.id,
+            "name": dataset.name,
+            "database_id": dataset.database_id,
+            "database_name": (
+                dataset.database.database_name if dataset.database else None
+            ),
+            "schema": dataset.schema,
+            "table_name": dataset.table_name,
+        }
+
+        # Get upstream (database) information
+        upstream: dict[str, Any] = {}
+        if dataset.database:
+            upstream["database"] = {
+                "id": dataset.database.id,
+                "database_name": dataset.database.database_name,
+                "backend": dataset.database.backend,
+            }
+        else:
+            upstream["database"] = None
+
+        # Get downstream (charts and dashboards) information
+        related_data = DatasetDAO.get_related_objects(dataset.id)
+
+        # Build chart information with dashboard IDs
+        charts = []
+        for chart in related_data["charts"]:
+            dashboard_ids = [d.id for d in chart.dashboards]
+            charts.append(
+                {
+                    "id": chart.id,
+                    "slice_name": chart.slice_name,
+                    "viz_type": chart.viz_type,
+                    "dashboard_ids": dashboard_ids,
+                }
+            )
+
+        # Build dashboard information with chart IDs
+        dashboards = []
+        for dashboard in related_data["dashboards"]:
+            chart_ids = [
+                chart.id
+                for chart in dashboard.slices
+                if chart.datasource_id == dataset.id
+            ]
+            dashboards.append(
+                {
+                    "id": dashboard.id,
+                    "title": dashboard.dashboard_title,
+                    "slug": dashboard.slug,
+                    "chart_ids": chart_ids,

Review Comment:
   **Suggestion:** The dashboard list is built from all related dashboards with 
no `can_access_dashboard` check, which leaks dashboard titles/slugs to users 
lacking dashboard access. Apply the same dashboard permission filtering used in 
`related_objects` before serializing downstream dashboards. [security]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Dataset lineage exposes unauthorized dashboard titles and slugs.
   - ⚠️ Users can enumerate dashboards for restricted datasets.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. In `superset/datasets/api.py:37-46`, observe that `related_objects` 
builds its
   `dashboards` list with a comprehension that includes `if
   security_manager.can_access_dashboard(dashboard)`, ensuring only dashboards 
visible to the
   current user are returned in the 
`/api/v1/dataset/<id_or_uuid>/related_objects` response.
   
   2. In the `lineage` endpoint at `superset/datasets/api.py:61-164`, after 
retrieving
   `related_data = DatasetDAO.get_related_objects(dataset.id)` at line 116, the 
code
   constructs `dashboards` in an imperative loop starting at line 931: it sets 
`dashboards =
   []`, iterates `for dashboard in related_data["dashboards"]` (line 932), 
computes
   `chart_ids` by iterating `dashboard.slices` (lines 133–137 in the snippet), 
and appends a
   dict with `id`, `title`, `slug`, and `chart_ids` (lines 139–143).
   
   3. This downstream dashboard construction does not call
   `security_manager.can_access_dashboard(dashboard)`, unlike `related_objects` 
and similar
   code in `superset/databases/api.py:1342-1350`, which explicitly filter 
dashboards via
   `security_manager.can_access_dashboard(dashboard)` for database-related 
metadata APIs.
   
   4. For a user profile with dataset-level access but limited dashboard 
permissions (i.e.,
   `security_manager.can_access_dashboard` returns `False` for some dashboards 
linked to this
   dataset), calling `GET /api/v1/dataset/<id_or_uuid>/lineage` will still 
return
   `downstream.dashboards.result[*]` entries for those dashboards, exposing 
their IDs,
   titles, and slugs even though `/api/v1/dataset/<id_or_uuid>/related_objects` 
correctly
   hides them.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=af3600539ab84d3282a8839807bc8460&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=af3600539ab84d3282a8839807bc8460&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/datasets/api.py
   **Line:** 931:943
   **Comment:**
        *Security: The dashboard list is built from all related dashboards with 
no `can_access_dashboard` check, which leaks dashboard titles/slugs to users 
lacking dashboard access. Apply the same dashboard permission filtering used in 
`related_objects` before serializing downstream dashboards.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=ba80d31938354f9095b8b12db51dcd12349e2a87b3404108bf378d6d2e3d5318&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=ba80d31938354f9095b8b12db51dcd12349e2a87b3404108bf378d6d2e3d5318&reaction=dislike'>👎</a>



##########
tests/integration_tests/charts/api_tests.py:
##########
@@ -2444,3 +2449,30 @@ def test_related_owners_allowed_for_write_user(self):
         self.login(ADMIN_USERNAME)
         rv = self.client.get("api/v1/chart/related/owners")
         assert rv.status_code == 200
+
+    @pytest.mark.usefixtures("inject_expected_chart_lineage")
+    def test_get_chart_lineage(self):
+        """
+        Chart API: Test get chart lineage
+        """
+        self.login(ADMIN_USERNAME)
+        chart_id = self.chart_lineage["chart_id"]
+        expected = self.chart_lineage["expected"]
+
+        uri = f"api/v1/chart/{chart_id}/lineage"
+        rv = self.get_assert_metric(uri, "lineage")
+        assert rv.status_code == 200
+
+        data = json.loads(rv.data.decode("utf-8"))
+
+        # Assert the entire response matches expected structure
+        assert data == expected

Review Comment:
   **Suggestion:** This assertion is validating the wrong response shape: the 
chart lineage API returns a payload wrapped under `result`, so comparing the 
whole decoded JSON directly to `expected` will fail even when the endpoint is 
correct. Compare `data["result"]` to `expected` (or build `expected` with the 
`result` wrapper) to match the API contract. [api mismatch]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Chart lineage integration test fails despite correct endpoint shape.
   - ⚠️ CI pipeline for lineage feature can be blocked.
   - ⚠️ Misaligned test obscures true API contract verification.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Locate the chart lineage test in 
`tests/integration_tests/charts/api_tests.py:54-70`
   (relative segment) where `test_get_chart_lineage` logs in as admin, reads 
`chart_id` and
   `expected` from `self.chart_lineage`, then calls `uri =
   f"api/v1/chart/{chart_id}/lineage"` and `rv = self.get_assert_metric(uri, 
"lineage")`.
   
   2. Inspect the lineage fixture in 
`tests/integration_tests/fixtures/lineage.py:153-199`;
   the `inject_expected_chart_lineage` fixture sets 
`self.chart_lineage["expected"]` to a
   dict containing `"chart"`, `"upstream"`, and `"downstream"` keys, with no 
top-level
   `"result"` wrapper.
   
   3. Inspect the chart lineage endpoint implementation in 
`superset/charts/api.py:8-24` and
   `superset/charts/api.py:44-24` (method `ChartRestApi.lineage`): it builds a 
`result` dict
   and returns `self.response(200, result=result)`, which, as demonstrated by 
other tests
   such as `test_get_chart` in 
`tests/integration_tests/charts/api_tests.py:36-55`, produces
   a JSON payload where the lineage data is nested under a top-level `"result"` 
key (those
   tests access `data["result"]`).
   
   4. Run the chart API integration tests (e.g. `pytest
   tests/integration_tests/charts/api_tests.py -k test_get_chart_lineage`);
   `test_get_chart_lineage` decodes the response at
   `tests/integration_tests/charts/api_tests.py:2466-2469` and asserts `data == 
expected`
   instead of `data["result"] == expected`, causing the assertion to fail 
because `data` is
   `{"result": {...}}` while `expected` is just the inner lineage structure.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=d1d64dbf35d54569b48598c3ab442382&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=d1d64dbf35d54569b48598c3ab442382&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/integration_tests/charts/api_tests.py
   **Line:** 2466:2469
   **Comment:**
        *Api Mismatch: This assertion is validating the wrong response shape: 
the chart lineage API returns a payload wrapped under `result`, so comparing 
the whole decoded JSON directly to `expected` will fail even when the endpoint 
is correct. Compare `data["result"]` to `expected` (or build `expected` with 
the `result` wrapper) to match the API contract.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=bae373be3301f6a9b23544870726811a6ddf9a1f74c8bfabae23db4258a173af&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=bae373be3301f6a9b23544870726811a6ddf9a1f74c8bfabae23db4258a173af&reaction=dislike'>👎</a>



##########
superset/datasets/api.py:
##########
@@ -846,6 +849,119 @@ def related_objects(self, id_or_uuid: str) -> Response:
             dashboards={"count": len(dashboards), "result": dashboards},
         )
 
+    @expose("/<id_or_uuid>/lineage", methods=("GET",))
+    @protect()
+    @safe
+    @statsd_metrics
+    @event_logger.log_this_with_context(
+        action=lambda self, *args, **kwargs: 
f"{self.__class__.__name__}.lineage",
+        log_to_statsd=False,
+    )
+    def lineage(self, id_or_uuid: str) -> Response:
+        """Get lineage information for a dataset.
+        ---
+        get:
+          summary: Get lineage information for a dataset
+          description: >-
+            Returns upstream (database) and downstream (charts, dashboards) 
lineage
+            information for a dataset
+          parameters:
+          - in: path
+            name: id_or_uuid
+            schema:
+              type: string
+            description: Either the id of the dataset, or its uuid
+          responses:
+            200:
+              description: Lineage information
+              content:
+                application/json:
+                  schema:
+                    $ref: "#/components/schemas/DatasetLineageResponseSchema"
+            401:
+              $ref: '#/components/responses/401'
+            404:
+              $ref: '#/components/responses/404'
+            500:
+              $ref: '#/components/responses/500'
+        """
+        dataset = DatasetDAO.find_by_id_or_uuid(id_or_uuid)
+        if not dataset:
+            return self.response_404()
+
+        dataset_info = {
+            "id": dataset.id,
+            "name": dataset.name,
+            "database_id": dataset.database_id,
+            "database_name": (
+                dataset.database.database_name if dataset.database else None
+            ),
+            "schema": dataset.schema,
+            "table_name": dataset.table_name,
+        }
+
+        # Get upstream (database) information
+        upstream: dict[str, Any] = {}
+        if dataset.database:
+            upstream["database"] = {
+                "id": dataset.database.id,
+                "database_name": dataset.database.database_name,
+                "backend": dataset.database.backend,
+            }
+        else:
+            upstream["database"] = None
+
+        # Get downstream (charts and dashboards) information
+        related_data = DatasetDAO.get_related_objects(dataset.id)
+
+        # Build chart information with dashboard IDs
+        charts = []
+        for chart in related_data["charts"]:
+            dashboard_ids = [d.id for d in chart.dashboards]

Review Comment:
   **Suggestion:** Even inside each chart entry, `dashboard_ids` includes all 
linked dashboards and does not filter by dashboard permissions, so unauthorized 
dashboard IDs are leaked. Restrict this list to dashboards the current user can 
access. [security]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Chart entries leak IDs of inaccessible dashboards.
   - ⚠️ Attackers can map hidden dashboards via dataset lineage.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. In `superset/datasets/api.py:118-129` (within the `lineage` method), 
examine the loop
   over `related_data["charts"]`: for each `chart`, the code builds a 
`dashboard_ids` list
   via the comprehension `dashboard_ids = [d.id for d in chart.dashboards]` at 
line 920, then
   stores `dashboard_ids` in the chart payload returned under 
`downstream.charts.result`.
   
   2. There is no filtering of `chart.dashboards` by
   `security_manager.can_access_dashboard(d)` here, even though other APIs, 
such as
   `related_objects` in `superset/datasets/api.py:37-46` and database-related 
metadata
   endpoints in `superset/databases/api.py:1342-1350`, explicitly gate 
dashboard exposure
   behind `security_manager.can_access_dashboard(dashboard)`.
   
   3. Consider a chart that is linked to multiple dashboards, some of which the 
current user
   cannot access (`security_manager.can_access_dashboard` returns `False`). For 
a user with
   dataset access but limited dashboard permissions, calling `GET
   /api/v1/dataset/<id_or_uuid>/lineage` will yield chart entries where 
`dashboard_ids`
   includes the IDs of all linked dashboards from `chart.dashboards`, including 
dashboards
   that are not returned in `downstream.dashboards.result` once suggestion 4 is 
implemented.
   
   4. This mismatch means that even if downstream dashboards are filtered 
elsewhere, the
   `dashboard_ids` array still leaks the IDs of unauthorized dashboards via the 
chart
   entries, allowing users to infer the existence and identifiers of dashboards 
they should
   not see.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=56266530f4114cffab034cca20e4e3d6&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=56266530f4114cffab034cca20e4e3d6&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/datasets/api.py
   **Line:** 920:920
   **Comment:**
        *Security: Even inside each chart entry, `dashboard_ids` includes all 
linked dashboards and does not filter by dashboard permissions, so unauthorized 
dashboard IDs are leaked. Restrict this list to dashboards the current user can 
access.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=05365f202fa4cc24cdc405951facfeee279f4d5d539d68fa2fcdc422149669e6&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=05365f202fa4cc24cdc405951facfeee279f4d5d539d68fa2fcdc422149669e6&reaction=dislike'>👎</a>



##########
superset-frontend/src/hooks/apiResources/lineage.ts:
##########
@@ -0,0 +1,134 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import { useApiV1Resource } from './apiResources';
+
+// Database entity type
+export type DatabaseEntity = {
+  id: number;
+  database_name: string;
+  backend: string;
+};
+
+// Dataset entity type
+export type DatasetEntity = {
+  id: number;
+  name: string;
+  schema: string | null;
+  table_name: string;
+  database_id: number;
+  database_name: string;
+  chart_ids?: number[];
+};
+
+// Chart entity type
+export type ChartEntity = {
+  id: number;
+  slice_name: string;
+  viz_type: string;
+  dashboard_ids?: number[];
+  dataset_id?: number;
+};
+
+// Dashboard entity type
+export type DashboardEntity = {
+  id: number;
+  title: string;
+  slug: string;
+  chart_ids?: number[];
+};
+
+// Dataset lineage response type
+export type DatasetLineage = {
+  dataset: DatasetEntity;
+  upstream: {
+    database: DatabaseEntity;
+  };
+  downstream: {
+    charts: {
+      count: number;
+      result: ChartEntity[];
+    };
+    dashboards: {
+      count: number;
+      result: DashboardEntity[];
+    };
+  };
+};
+
+// Chart lineage response type
+export type ChartLineage = {
+  chart: ChartEntity & {
+    datasource_id: number;
+    datasource_type: string;
+  };
+  upstream: {
+    dataset: DatasetEntity;
+    database: DatabaseEntity;
+  };
+  downstream: {
+    dashboards: {
+      count: number;
+      result: DashboardEntity[];
+    };
+  };
+};
+
+// Dashboard lineage response type
+export type DashboardLineage = {
+  dashboard: DashboardEntity & {
+    published: boolean;
+  };
+  upstream: {
+    charts: {
+      count: number;
+      result: ChartEntity[];
+    };
+    datasets: {
+      count: number;
+      result: DatasetEntity[];
+    };
+    databases: {
+      count: number;
+      result: DatabaseEntity[];
+    };
+  };
+  downstream: null;
+};
+
+/**
+ * Hook to fetch lineage data for a dataset
+ * @param idOrUuid Dataset ID or UUID
+ */
+export const useDatasetLineage = (idOrUuid: string | number) =>
+  useApiV1Resource<DatasetLineage>(`/api/v1/dataset/${idOrUuid}/lineage`);
+
+/**
+ * Hook to fetch lineage data for a chart
+ * @param idOrUuid Chart ID or UUID
+ */
+export const useChartLineage = (idOrUuid: string | number) =>
+  useApiV1Resource<ChartLineage>(`/api/v1/chart/${idOrUuid}/lineage`);
+
+/**
+ * Hook to fetch lineage data for a dashboard
+ * @param idOrSlug Dashboard ID or slug
+ */
+export const useDashboardLineage = (idOrSlug: string | number) =>
+  useApiV1Resource<DashboardLineage>(`/api/v1/dashboard/${idOrSlug}/lineage`);

Review Comment:
   **Suggestion:** These hooks always build and request a URL even when the 
identifier is empty, and callers currently pass `''` for non-selected entity 
types; this triggers invalid requests like `/api/v1/chart//lineage` and 
unnecessary error states/network traffic. Add a skip mechanism (for example 
skip token support) or require callers to pass only valid IDs and avoid firing 
inactive lineage requests. [api mismatch]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Extra lineage HTTP calls for invalid dataset/dashboard endpoints.
   - ⚠️ Unnecessary 4xx errors clutter logs and monitoring metrics.
   - ⚠️ Slightly slower lineage modal due to wasted network requests.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Open the chart list and click the "View Lineage" action for a chart, 
which renders
   `LineageModal` via `ChartCard` at
   `superset-frontend/src/features/charts/ChartCard.tsx:10-30` (the menu item 
with `key:
   'lineage'` wraps a `<LineageModal entityType="chart" entityId={chart.id} ... 
/>`).
   
   2. When `LineageModal` mounts
   (`superset-frontend/src/features/lineage/LineageModal.tsx:35-47`), it 
unconditionally
   calls all three hooks:
   
      - `useDatasetLineage(entityType === 'dataset' ? entityId : '')`
   
      - `useChartLineage(entityType === 'chart' ? entityId : '')`
   
      - `useDashboardLineage(entityType === 'dashboard' ? entityId : '')`
   
      so for a chart lineage view `datasetLineage` and `dashboardLineage` are 
invoked with
      `idOrUuid = ''`.
   
   3. The hooks in 
`superset-frontend/src/hooks/apiResources/lineage.ts:119-134` interpolate
   the identifier directly into the endpoint string:
   
      - `useDatasetLineage` calls
      
`useApiV1Resource<DatasetLineage>(\`/api/v1/dataset/${idOrUuid}/lineage\`)`
   
      - `useDashboardLineage` calls
      
`useApiV1Resource<DashboardLineage>(\`/api/v1/dashboard/${idOrSlug}/lineage\`)`
   
      which, with `idOrUuid`/`idOrSlug` equal to `''`, produce malformed URLs
      `/api/v1/dataset//lineage` and `/api/v1/dashboard//lineage`.
   
   4. `useApiV1Resource` delegates to `useApiResourceFullBody` in
   `superset-frontend/src/hooks/apiResources/apiResources.ts:87-137`, which 
unconditionally
   constructs a `makeApi` GET request for the given `endpoint` and executes it; 
this issues
   real HTTP calls to `/api/v1/dataset//lineage` and 
`/api/v1/dashboard//lineage`, leading to
   unnecessary network traffic and error-state resources every time lineage is 
opened for a
   chart (and similarly for datasets/dashboards where the other two hooks 
receive an empty
   string).
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=1e99776911e5401aade90fd009f9dd40&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=1e99776911e5401aade90fd009f9dd40&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset-frontend/src/hooks/apiResources/lineage.ts
   **Line:** 119:134
   **Comment:**
        *Api Mismatch: These hooks always build and request a URL even when the 
identifier is empty, and callers currently pass `''` for non-selected entity 
types; this triggers invalid requests like `/api/v1/chart//lineage` and 
unnecessary error states/network traffic. Add a skip mechanism (for example 
skip token support) or require callers to pass only valid IDs and avoid firing 
inactive lineage requests.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=9048046fe1ac37f0e70f0468d9ee42abd04981dc05908ca8c10e9edbf75eb89a&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=9048046fe1ac37f0e70f0468d9ee42abd04981dc05908ca8c10e9edbf75eb89a&reaction=dislike'>👎</a>



##########
superset/charts/api.py:
##########
@@ -313,6 +314,103 @@ def get(self, id_or_uuid: str) -> Response:
         except ChartNotFoundError:
             return self.response_404()
 
+    @expose("/<id_or_uuid>/lineage", methods=("GET",))
+    @protect()
+    @safe
+    @statsd_metrics
+    @event_logger.log_this_with_context(
+        action=lambda self, *args, **kwargs: 
f"{self.__class__.__name__}.lineage",
+        log_to_statsd=False,
+    )
+    def lineage(self, id_or_uuid: str) -> Response:
+        """Get lineage information for a chart.
+        ---
+        get:
+          summary: Get lineage information for a chart
+          description: >-
+            Returns upstream (dataset, database) and downstream (dashboards) 
lineage
+            information for a chart
+          parameters:
+          - in: path
+            name: id_or_uuid
+            schema:
+              type: string
+            description: Either the id of the chart, or its uuid
+          responses:
+            200:
+              description: Lineage information
+              content:
+                application/json:
+                  schema:
+                    $ref: "#/components/schemas/ChartLineageResponseSchema"
+            401:
+              $ref: '#/components/responses/401'
+            404:
+              $ref: '#/components/responses/404'
+            500:
+              $ref: '#/components/responses/500'
+        """
+        try:
+            chart = ChartDAO.get_by_id_or_uuid(id_or_uuid)
+        except ChartNotFoundError:
+            return self.response_404()
+
+        chart_info = {
+            "id": chart.id,
+            "slice_name": chart.slice_name,
+            "viz_type": chart.viz_type,
+        }
+
+        # Get upstream (dataset and database) information
+        upstream: dict[str, Any] = {}
+        if dataset := chart.datasource:
+            upstream["dataset"] = {
+                "id": dataset.id,
+                "name": dataset.name,
+                "database_id": dataset.database_id,
+                "database_name": dataset.database.database_name
+                if dataset.database
+                else None,
+                "schema": dataset.schema,
+                "table_name": dataset.table_name,
+            }
+            if dataset.database:
+                upstream["database"] = {
+                    "id": dataset.database.id,
+                    "database_name": dataset.database.database_name,
+                    "backend": dataset.database.backend,
+                }
+            else:
+                upstream["database"] = None
+        else:
+            upstream["dataset"] = None
+            upstream["database"] = None
+
+        # Get downstream (dashboards) information
+        dashboards = []
+        for dashboard in chart.dashboards:
+            dashboards.append(
+                {
+                    "id": dashboard.id,
+                    "title": dashboard.dashboard_title,
+                    "slug": dashboard.slug,
+                }
+            )

Review Comment:
   **Suggestion:** Downstream dashboards are returned without per-dashboard 
access checks. Unlike the dataset lineage endpoint (which filters with security 
checks), this can leak private dashboard titles/slugs to users who can access a 
chart but not all dashboards containing it. Filter `chart.dashboards` with the 
dashboard access guard before adding them to the response. [security]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Chart lineage leaks titles/slugs of unauthorized dashboards.
   - ⚠️ Exposes structure of restricted dashboards via lineage graph.
   - ⚠️ Inconsistent security with other dashboard-filtered endpoints.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Note the chart lineage endpoint implementation in 
`superset/charts/api.py:58-153`:
   `ChartRestApi.lineage` loads a chart via 
`ChartDAO.get_by_id_or_uuid(id_or_uuid)` and
   then, in the downstream section at lines 130-139 (file lines ~389-397), 
iterates over
   `chart.dashboards`:
   
      - initializes `dashboards = []`
   
      - `for dashboard in chart.dashboards: dashboards.append({"id": 
dashboard.id, "title":
      dashboard.dashboard_title, "slug": dashboard.slug})`
   
      with no call to `security_manager.can_access_dashboard` or any access 
filter.
   
   2. Compare this to dataset-related logic in `superset/datasets/api.py:1-11, 
76-99`: when
   building dashboard lists for dataset-related data, dashboards are filtered 
with
   `security_manager.can_access_dashboard(dashboard)` (see
   `superset/superset/datasets/api.py:1-5` in the snippet before the dataset 
`lineage`
   method) and the guard itself is defined in 
`superset/security/manager.py:940-17` as
   `can_access_dashboard(self, dashboard)` using 
`self.raise_for_access(dashboard=dashboard)`
   to enforce dashboard permissions.
   
   3. On the frontend, opening lineage for a chart (either from the chart card 
menu or the
   Explore additional actions menu) constructs a `LineageModal` with 
`entityType="chart"` and
   `entityId={chart.id}`, e.g. 
`superset-frontend/src/features/charts/ChartCard.tsx:10-29`
   and 
`superset-frontend/src/explore/components/useExploreAdditionalActionsMenu/index.tsx`
   (import of `LineageModal` at lines 71-76, followed by menus using
   `MENU_KEYS.VIEW_LINEAGE`). `LineageModal` then calls 
`useChartLineage(entityType ===
   'chart' ? entityId : '')` in
   `superset-frontend/src/features/lineage/LineageModal.tsx:35-44`, which 
ultimately hits
   `GET /api/v1/chart/<chart_id>/lineage`.
   
   4. Because `ChartRestApi.lineage` returns *all* dashboards from 
`chart.dashboards` without
   filtering, any user who can access the chart but not all of its dashboards 
(enforced via
   `DashboardAccessFilter` in `superset/dashboards/api.py:143-145` and
   `security_manager.can_access_dashboard` in 
`superset/security/manager.py:940-17`) will
   still receive the `id`, `title`, and `slug` of restricted dashboards in the 
lineage
   response; `LineageView` then renders these downstream dashboards in the 
Sankey graph at
   `superset-frontend/src/features/lineage/LineageView.tsx:256-260`, exposing 
metadata for
   dashboards they cannot otherwise open.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=459f3b33f3ef4bfc9b5626567af9dc87&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=459f3b33f3ef4bfc9b5626567af9dc87&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/charts/api.py
   **Line:** 390:398
   **Comment:**
        *Security: Downstream dashboards are returned without per-dashboard 
access checks. Unlike the dataset lineage endpoint (which filters with security 
checks), this can leak private dashboard titles/slugs to users who can access a 
chart but not all dashboards containing it. Filter `chart.dashboards` with the 
dashboard access guard before adding them to the response.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=8f8d7d887ecfed434d22ebab684c3c2029c69669f8e92743cb430fb725e57262&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=8f8d7d887ecfed434d22ebab684c3c2029c69669f8e92743cb430fb725e57262&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat(lineage): add lineage visualization across datasets, charts and dashboards [superset]

Reply via email to