codeant-ai-for-open-source[bot] commented on code in PR #40912:
URL: https://github.com/apache/superset/pull/40912#discussion_r3382658487
##########
superset/dashboards/api.py:
##########
@@ -524,6 +525,114 @@ def get(
)
return self.response(200, result=result)
+ @expose("/<id_or_slug>/lineage", methods=("GET",))
+ @protect()
+ @safe
+ @statsd_metrics
+ @with_dashboard
+ @event_logger.log_this_with_context(
+ action=lambda self, *args, **kwargs:
f"{self.__class__.__name__}.lineage",
+ log_to_statsd=False,
+ )
+ # pylint: disable=arguments-differ,arguments-renamed
+ def lineage(self, dash: Dashboard) -> Response:
+ """Get lineage information for a dashboard.
+ ---
+ get:
+ summary: Get lineage information for a dashboard
+ description: >-
+ Returns upstream (charts, datasets, databases) lineage information
+ for a dashboard
+ parameters:
+ - in: path
+ name: id_or_slug
+ schema:
+ type: string
+ description: Either the id of the dashboard, or its slug
+ responses:
+ 200:
+ description: Lineage information
+ content:
+ application/json:
+ schema:
+ $ref: "#/components/schemas/DashboardLineageResponseSchema"
Review Comment:
**Suggestion:** The new endpoint references `DashboardLineageResponseSchema`
in OpenAPI docs, but this schema is not registered in
`openapi_spec_component_schemas` for `DashboardRestApi`, producing an
unresolved `$ref` in generated API specs. Import and add the schema to
component schemas to keep the API contract valid. [api mismatch]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ OpenAPI spec has unresolved DashboardLineage schema reference.
- ⚠️ Client codegen may fail for dashboard lineage endpoint.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. In `superset/dashboards/api.py:60-80`, inspect the docstring for
`DashboardRestApi.lineage`. The 200 response schema uses an OpenAPI `$ref` to
`#/components/schemas/DashboardLineageResponseSchema` (line 558 in the PR
hunk).
2. Locate the definition of `DashboardLineageResponseSchema` in
`superset/dashboards/schemas.py:614-617`, where it is declared as a
Marshmallow schema
composing `DashboardLineageDashboardSchema`,
`DashboardLineageUpstreamSchema`, and a
`downstream` field.
3. Examine `DashboardRestApi`'s OpenAPI configuration in
`superset/dashboards/api.py:426-438`. The tuple
`openapi_spec_component_schemas` includes
`ChartEntityResponseSchema`, `DashboardCacheScreenshotResponseSchema`,
`DashboardCopySchema`, `DashboardGetResponseSchema`,
`DashboardDatasetSchema`,
`TabsPayloadSchema`, `GetFavStarIdsSchema`,
`EmbeddedDashboardResponseSchema`, and
`DashboardScreenshotPostSchema`, but does not include
`DashboardLineageResponseSchema`.
4. Compare this to `DatasetRestApi` in `superset/datasets/api.py:295-305`,
where
`openapi_spec_component_schemas` explicitly includes
`DatasetLineageResponseSchema` to
back the `$ref` used in its lineage docstring (lines 76–81 of its doc). When
Superset's
OpenAPI generator assembles the spec using `openapi_spec_component_schemas`,
the
unresolved `$ref` to `DashboardLineageResponseSchema` in the dashboard
lineage endpoint
will not match any registered schema component, producing an invalid or
partially resolved
OpenAPI document for `/api/v1/dashboard/<id_or_slug>/lineage`.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=1ff67ae62ac24af094d823a1f59327b8&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=1ff67ae62ac24af094d823a1f59327b8&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/dashboards/api.py
**Line:** 558:558
**Comment:**
*Api Mismatch: The new endpoint references
`DashboardLineageResponseSchema` in OpenAPI docs, but this schema is not
registered in `openapi_spec_component_schemas` for `DashboardRestApi`,
producing an unresolved `$ref` in generated API specs. Import and add the
schema to component schemas to keep the API contract valid.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=b3c1b3f6b44d0fab1e17284e2e4fa86a0fe8e8bd2813dd94dd4c84c296457c33&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=b3c1b3f6b44d0fab1e17284e2e4fa86a0fe8e8bd2813dd94dd4c84c296457c33&reaction=dislike'>👎</a>
##########
superset/dashboards/api.py:
##########
@@ -524,6 +525,114 @@ def get(
)
return self.response(200, result=result)
+ @expose("/<id_or_slug>/lineage", methods=("GET",))
+ @protect()
+ @safe
+ @statsd_metrics
+ @with_dashboard
+ @event_logger.log_this_with_context(
+ action=lambda self, *args, **kwargs:
f"{self.__class__.__name__}.lineage",
+ log_to_statsd=False,
+ )
+ # pylint: disable=arguments-differ,arguments-renamed
+ def lineage(self, dash: Dashboard) -> Response:
+ """Get lineage information for a dashboard.
+ ---
+ get:
+ summary: Get lineage information for a dashboard
+ description: >-
+ Returns upstream (charts, datasets, databases) lineage information
+ for a dashboard
+ parameters:
+ - in: path
+ name: id_or_slug
+ schema:
+ type: string
+ description: Either the id of the dashboard, or its slug
+ responses:
+ 200:
+ description: Lineage information
+ content:
+ application/json:
+ schema:
+ $ref: "#/components/schemas/DashboardLineageResponseSchema"
+ 401:
+ $ref: '#/components/responses/401'
+ 404:
+ $ref: '#/components/responses/404'
+ 500:
+ $ref: '#/components/responses/500'
+ """
+ dashboard_info = {
+ "id": dash.id,
+ "title": dash.dashboard_title,
+ "slug": dash.slug,
+ "published": dash.published,
+ }
+
+ # Get upstream (charts, datasets, databases) information
+ charts = []
+ dataset_map = {}
+ database_map = {}
+
+ for chart in dash.slices:
+ charts.append(
+ {
+ "id": chart.id,
+ "slice_name": chart.slice_name,
+ "viz_type": chart.viz_type,
+ "dataset_id": chart.datasource_id,
+ }
+ )
+
+ # Collect dataset information
+ dataset = chart.datasource
+ if dataset and dataset.id not in dataset_map:
+ dataset_map[dataset.id] = {
+ "id": dataset.id,
+ "name": dataset.name,
+ "database_id": dataset.database_id,
+ "database_name": dataset.database.database_name
+ if dataset.database
+ else None,
+ "schema": dataset.schema,
+ "table_name": dataset.table_name,
+ "chart_ids": [],
Review Comment:
**Suggestion:** Dataset/database metadata is added to lineage for every
chart in the dashboard without checking datasource access, while other
dashboard endpoints explicitly redact datasource details when access is
missing. This can leak schema/table/database metadata to users who can open the
dashboard but are not allowed to inspect datasource internals. Gate these
fields behind `can_access_datasource` (or redact sensitive fields). [security]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ Dashboard lineage API leaks restricted datasource metadata.
- ⚠️ Lineage UI reveals schema and table names.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Observe the dashboard lineage endpoint definition at
`superset/dashboards/api.py:49-59`, where `DashboardRestApi.lineage` is
exposed as `GET
/api/v1/dashboard/<id_or_slug>/lineage` with `@protect()` and
`@with_dashboard` but no
datasource-level access checks inside the method body.
2. Inspect the implementation of `lineage` in
`superset/dashboards/api.py:87-151`. For
each chart in `dash.slices` (loop starting at line 99), the code assigns
`dataset =
chart.datasource` and, when `dataset` is truthy and not yet in
`dataset_map`, stores
metadata including `database_id`, `database_name`, `schema`, and
`table_name` (lines
590–600 in the PR hunk).
3. Compare this to `_serialize_dashboard_dataset` at
`superset/dashboards/api.py:210-220`,
which explicitly calls `security_manager.can_access_datasource(datasource)`
and redacts
several fields when the current user lacks datasource access. The lineage
method does not
call `security_manager.can_access_datasource` at all, so it never redacts
dataset/database
metadata.
4. In a deployment where a user has access to a dashboard but only partial
access to its
datasources (per-object checks implemented in
`security_manager.can_access_datasource` at
`superset/security/manager.py:833-838` and `can_access_dashboard` around
`superset/security/manager.py:3272`), calling `GET
/api/v1/dashboard/<id_or_slug>/lineage`
returns `upstream.datasets.result[*]` entries containing `database_name`,
`schema`, and
`table_name` for every `chart.datasource`, including those datasources where
`security_manager.can_access_datasource` would return `False`, leaking
underlying
schema/table/database metadata to unauthorized viewers.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=6839f69023f64713a8f6d552b9396e5d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=6839f69023f64713a8f6d552b9396e5d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/dashboards/api.py
**Line:** 590:600
**Comment:**
*Security: Dataset/database metadata is added to lineage for every
chart in the dashboard without checking datasource access, while other
dashboard endpoints explicitly redact datasource details when access is
missing. This can leak schema/table/database metadata to users who can open the
dashboard but are not allowed to inspect datasource internals. Gate these
fields behind `can_access_datasource` (or redact sensitive fields).
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=21a6fa1b3c3ed7ce4a6c1baa075184d0decd470250a0c00d694e06e5d2f12ca8&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=21a6fa1b3c3ed7ce4a6c1baa075184d0decd470250a0c00d694e06e5d2f12ca8&reaction=dislike'>👎</a>
##########
superset/datasets/api.py:
##########
@@ -846,6 +849,119 @@ def related_objects(self, id_or_uuid: str) -> Response:
dashboards={"count": len(dashboards), "result": dashboards},
)
+ @expose("/<id_or_uuid>/lineage", methods=("GET",))
+ @protect()
+ @safe
+ @statsd_metrics
+ @event_logger.log_this_with_context(
+ action=lambda self, *args, **kwargs:
f"{self.__class__.__name__}.lineage",
+ log_to_statsd=False,
+ )
+ def lineage(self, id_or_uuid: str) -> Response:
+ """Get lineage information for a dataset.
+ ---
+ get:
+ summary: Get lineage information for a dataset
+ description: >-
+ Returns upstream (database) and downstream (charts, dashboards)
lineage
+ information for a dataset
+ parameters:
+ - in: path
+ name: id_or_uuid
+ schema:
+ type: string
+ description: Either the id of the dataset, or its uuid
+ responses:
+ 200:
+ description: Lineage information
+ content:
+ application/json:
+ schema:
+ $ref: "#/components/schemas/DatasetLineageResponseSchema"
+ 401:
+ $ref: '#/components/responses/401'
+ 404:
+ $ref: '#/components/responses/404'
+ 500:
+ $ref: '#/components/responses/500'
+ """
+ dataset = DatasetDAO.find_by_id_or_uuid(id_or_uuid)
+ if not dataset:
+ return self.response_404()
+
+ dataset_info = {
+ "id": dataset.id,
+ "name": dataset.name,
+ "database_id": dataset.database_id,
+ "database_name": (
+ dataset.database.database_name if dataset.database else None
+ ),
+ "schema": dataset.schema,
+ "table_name": dataset.table_name,
+ }
+
+ # Get upstream (database) information
+ upstream: dict[str, Any] = {}
+ if dataset.database:
+ upstream["database"] = {
+ "id": dataset.database.id,
+ "database_name": dataset.database.database_name,
+ "backend": dataset.database.backend,
+ }
+ else:
+ upstream["database"] = None
+
+ # Get downstream (charts and dashboards) information
+ related_data = DatasetDAO.get_related_objects(dataset.id)
+
+ # Build chart information with dashboard IDs
+ charts = []
+ for chart in related_data["charts"]:
+ dashboard_ids = [d.id for d in chart.dashboards]
+ charts.append(
+ {
+ "id": chart.id,
+ "slice_name": chart.slice_name,
+ "viz_type": chart.viz_type,
+ "dashboard_ids": dashboard_ids,
+ }
+ )
+
+ # Build dashboard information with chart IDs
+ dashboards = []
+ for dashboard in related_data["dashboards"]:
+ chart_ids = [
+ chart.id
+ for chart in dashboard.slices
+ if chart.datasource_id == dataset.id
+ ]
+ dashboards.append(
+ {
+ "id": dashboard.id,
+ "title": dashboard.dashboard_title,
+ "slug": dashboard.slug,
+ "chart_ids": chart_ids,
Review Comment:
**Suggestion:** The dashboard list is built from all related dashboards with
no `can_access_dashboard` check, which leaks dashboard titles/slugs to users
lacking dashboard access. Apply the same dashboard permission filtering used in
`related_objects` before serializing downstream dashboards. [security]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ Dataset lineage exposes unauthorized dashboard titles and slugs.
- ⚠️ Users can enumerate dashboards for restricted datasets.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. In `superset/datasets/api.py:37-46`, observe that `related_objects`
builds its
`dashboards` list with a comprehension that includes `if
security_manager.can_access_dashboard(dashboard)`, ensuring only dashboards
visible to the
current user are returned in the
`/api/v1/dataset/<id_or_uuid>/related_objects` response.
2. In the `lineage` endpoint at `superset/datasets/api.py:61-164`, after
retrieving
`related_data = DatasetDAO.get_related_objects(dataset.id)` at line 116, the
code
constructs `dashboards` in an imperative loop starting at line 931: it sets
`dashboards =
[]`, iterates `for dashboard in related_data["dashboards"]` (line 932),
computes
`chart_ids` by iterating `dashboard.slices` (lines 133–137 in the snippet),
and appends a
dict with `id`, `title`, `slug`, and `chart_ids` (lines 139–143).
3. This downstream dashboard construction does not call
`security_manager.can_access_dashboard(dashboard)`, unlike `related_objects`
and similar
code in `superset/databases/api.py:1342-1350`, which explicitly filter
dashboards via
`security_manager.can_access_dashboard(dashboard)` for database-related
metadata APIs.
4. For a user profile with dataset-level access but limited dashboard
permissions (i.e.,
`security_manager.can_access_dashboard` returns `False` for some dashboards
linked to this
dataset), calling `GET /api/v1/dataset/<id_or_uuid>/lineage` will still
return
`downstream.dashboards.result[*]` entries for those dashboards, exposing
their IDs,
titles, and slugs even though `/api/v1/dataset/<id_or_uuid>/related_objects`
correctly
hides them.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=af3600539ab84d3282a8839807bc8460&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=af3600539ab84d3282a8839807bc8460&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/datasets/api.py
**Line:** 931:943
**Comment:**
*Security: The dashboard list is built from all related dashboards with
no `can_access_dashboard` check, which leaks dashboard titles/slugs to users
lacking dashboard access. Apply the same dashboard permission filtering used in
`related_objects` before serializing downstream dashboards.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=ba80d31938354f9095b8b12db51dcd12349e2a87b3404108bf378d6d2e3d5318&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=ba80d31938354f9095b8b12db51dcd12349e2a87b3404108bf378d6d2e3d5318&reaction=dislike'>👎</a>
##########
tests/integration_tests/charts/api_tests.py:
##########
@@ -2444,3 +2449,30 @@ def test_related_owners_allowed_for_write_user(self):
self.login(ADMIN_USERNAME)
rv = self.client.get("api/v1/chart/related/owners")
assert rv.status_code == 200
+
+ @pytest.mark.usefixtures("inject_expected_chart_lineage")
+ def test_get_chart_lineage(self):
+ """
+ Chart API: Test get chart lineage
+ """
+ self.login(ADMIN_USERNAME)
+ chart_id = self.chart_lineage["chart_id"]
+ expected = self.chart_lineage["expected"]
+
+ uri = f"api/v1/chart/{chart_id}/lineage"
+ rv = self.get_assert_metric(uri, "lineage")
+ assert rv.status_code == 200
+
+ data = json.loads(rv.data.decode("utf-8"))
+
+ # Assert the entire response matches expected structure
+ assert data == expected
Review Comment:
**Suggestion:** This assertion is validating the wrong response shape: the
chart lineage API returns a payload wrapped under `result`, so comparing the
whole decoded JSON directly to `expected` will fail even when the endpoint is
correct. Compare `data["result"]` to `expected` (or build `expected` with the
`result` wrapper) to match the API contract. [api mismatch]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ Chart lineage integration test fails despite correct endpoint shape.
- ⚠️ CI pipeline for lineage feature can be blocked.
- ⚠️ Misaligned test obscures true API contract verification.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Locate the chart lineage test in
`tests/integration_tests/charts/api_tests.py:54-70`
(relative segment) where `test_get_chart_lineage` logs in as admin, reads
`chart_id` and
`expected` from `self.chart_lineage`, then calls `uri =
f"api/v1/chart/{chart_id}/lineage"` and `rv = self.get_assert_metric(uri,
"lineage")`.
2. Inspect the lineage fixture in
`tests/integration_tests/fixtures/lineage.py:153-199`;
the `inject_expected_chart_lineage` fixture sets
`self.chart_lineage["expected"]` to a
dict containing `"chart"`, `"upstream"`, and `"downstream"` keys, with no
top-level
`"result"` wrapper.
3. Inspect the chart lineage endpoint implementation in
`superset/charts/api.py:8-24` and
`superset/charts/api.py:44-24` (method `ChartRestApi.lineage`): it builds a
`result` dict
and returns `self.response(200, result=result)`, which, as demonstrated by
other tests
such as `test_get_chart` in
`tests/integration_tests/charts/api_tests.py:36-55`, produces
a JSON payload where the lineage data is nested under a top-level `"result"`
key (those
tests access `data["result"]`).
4. Run the chart API integration tests (e.g. `pytest
tests/integration_tests/charts/api_tests.py -k test_get_chart_lineage`);
`test_get_chart_lineage` decodes the response at
`tests/integration_tests/charts/api_tests.py:2466-2469` and asserts `data ==
expected`
instead of `data["result"] == expected`, causing the assertion to fail
because `data` is
`{"result": {...}}` while `expected` is just the inner lineage structure.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=d1d64dbf35d54569b48598c3ab442382&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=d1d64dbf35d54569b48598c3ab442382&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** tests/integration_tests/charts/api_tests.py
**Line:** 2466:2469
**Comment:**
*Api Mismatch: This assertion is validating the wrong response shape:
the chart lineage API returns a payload wrapped under `result`, so comparing
the whole decoded JSON directly to `expected` will fail even when the endpoint
is correct. Compare `data["result"]` to `expected` (or build `expected` with
the `result` wrapper) to match the API contract.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=bae373be3301f6a9b23544870726811a6ddf9a1f74c8bfabae23db4258a173af&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=bae373be3301f6a9b23544870726811a6ddf9a1f74c8bfabae23db4258a173af&reaction=dislike'>👎</a>
##########
superset/datasets/api.py:
##########
@@ -846,6 +849,119 @@ def related_objects(self, id_or_uuid: str) -> Response:
dashboards={"count": len(dashboards), "result": dashboards},
)
+ @expose("/<id_or_uuid>/lineage", methods=("GET",))
+ @protect()
+ @safe
+ @statsd_metrics
+ @event_logger.log_this_with_context(
+ action=lambda self, *args, **kwargs:
f"{self.__class__.__name__}.lineage",
+ log_to_statsd=False,
+ )
+ def lineage(self, id_or_uuid: str) -> Response:
+ """Get lineage information for a dataset.
+ ---
+ get:
+ summary: Get lineage information for a dataset
+ description: >-
+ Returns upstream (database) and downstream (charts, dashboards)
lineage
+ information for a dataset
+ parameters:
+ - in: path
+ name: id_or_uuid
+ schema:
+ type: string
+ description: Either the id of the dataset, or its uuid
+ responses:
+ 200:
+ description: Lineage information
+ content:
+ application/json:
+ schema:
+ $ref: "#/components/schemas/DatasetLineageResponseSchema"
+ 401:
+ $ref: '#/components/responses/401'
+ 404:
+ $ref: '#/components/responses/404'
+ 500:
+ $ref: '#/components/responses/500'
+ """
+ dataset = DatasetDAO.find_by_id_or_uuid(id_or_uuid)
+ if not dataset:
+ return self.response_404()
+
+ dataset_info = {
+ "id": dataset.id,
+ "name": dataset.name,
+ "database_id": dataset.database_id,
+ "database_name": (
+ dataset.database.database_name if dataset.database else None
+ ),
+ "schema": dataset.schema,
+ "table_name": dataset.table_name,
+ }
+
+ # Get upstream (database) information
+ upstream: dict[str, Any] = {}
+ if dataset.database:
+ upstream["database"] = {
+ "id": dataset.database.id,
+ "database_name": dataset.database.database_name,
+ "backend": dataset.database.backend,
+ }
+ else:
+ upstream["database"] = None
+
+ # Get downstream (charts and dashboards) information
+ related_data = DatasetDAO.get_related_objects(dataset.id)
+
+ # Build chart information with dashboard IDs
+ charts = []
+ for chart in related_data["charts"]:
+ dashboard_ids = [d.id for d in chart.dashboards]
Review Comment:
**Suggestion:** Even inside each chart entry, `dashboard_ids` includes all
linked dashboards and does not filter by dashboard permissions, so unauthorized
dashboard IDs are leaked. Restrict this list to dashboards the current user can
access. [security]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ Chart entries leak IDs of inaccessible dashboards.
- ⚠️ Attackers can map hidden dashboards via dataset lineage.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. In `superset/datasets/api.py:118-129` (within the `lineage` method),
examine the loop
over `related_data["charts"]`: for each `chart`, the code builds a
`dashboard_ids` list
via the comprehension `dashboard_ids = [d.id for d in chart.dashboards]` at
line 920, then
stores `dashboard_ids` in the chart payload returned under
`downstream.charts.result`.
2. There is no filtering of `chart.dashboards` by
`security_manager.can_access_dashboard(d)` here, even though other APIs,
such as
`related_objects` in `superset/datasets/api.py:37-46` and database-related
metadata
endpoints in `superset/databases/api.py:1342-1350`, explicitly gate
dashboard exposure
behind `security_manager.can_access_dashboard(dashboard)`.
3. Consider a chart that is linked to multiple dashboards, some of which the
current user
cannot access (`security_manager.can_access_dashboard` returns `False`). For
a user with
dataset access but limited dashboard permissions, calling `GET
/api/v1/dataset/<id_or_uuid>/lineage` will yield chart entries where
`dashboard_ids`
includes the IDs of all linked dashboards from `chart.dashboards`, including
dashboards
that are not returned in `downstream.dashboards.result` once suggestion 4 is
implemented.
4. This mismatch means that even if downstream dashboards are filtered
elsewhere, the
`dashboard_ids` array still leaks the IDs of unauthorized dashboards via the
chart
entries, allowing users to infer the existence and identifiers of dashboards
they should
not see.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=56266530f4114cffab034cca20e4e3d6&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=56266530f4114cffab034cca20e4e3d6&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/datasets/api.py
**Line:** 920:920
**Comment:**
*Security: Even inside each chart entry, `dashboard_ids` includes all
linked dashboards and does not filter by dashboard permissions, so unauthorized
dashboard IDs are leaked. Restrict this list to dashboards the current user can
access.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=05365f202fa4cc24cdc405951facfeee279f4d5d539d68fa2fcdc422149669e6&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=05365f202fa4cc24cdc405951facfeee279f4d5d539d68fa2fcdc422149669e6&reaction=dislike'>👎</a>
##########
superset-frontend/src/hooks/apiResources/lineage.ts:
##########
@@ -0,0 +1,134 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import { useApiV1Resource } from './apiResources';
+
+// Database entity type
+export type DatabaseEntity = {
+ id: number;
+ database_name: string;
+ backend: string;
+};
+
+// Dataset entity type
+export type DatasetEntity = {
+ id: number;
+ name: string;
+ schema: string | null;
+ table_name: string;
+ database_id: number;
+ database_name: string;
+ chart_ids?: number[];
+};
+
+// Chart entity type
+export type ChartEntity = {
+ id: number;
+ slice_name: string;
+ viz_type: string;
+ dashboard_ids?: number[];
+ dataset_id?: number;
+};
+
+// Dashboard entity type
+export type DashboardEntity = {
+ id: number;
+ title: string;
+ slug: string;
+ chart_ids?: number[];
+};
+
+// Dataset lineage response type
+export type DatasetLineage = {
+ dataset: DatasetEntity;
+ upstream: {
+ database: DatabaseEntity;
+ };
+ downstream: {
+ charts: {
+ count: number;
+ result: ChartEntity[];
+ };
+ dashboards: {
+ count: number;
+ result: DashboardEntity[];
+ };
+ };
+};
+
+// Chart lineage response type
+export type ChartLineage = {
+ chart: ChartEntity & {
+ datasource_id: number;
+ datasource_type: string;
+ };
+ upstream: {
+ dataset: DatasetEntity;
+ database: DatabaseEntity;
+ };
+ downstream: {
+ dashboards: {
+ count: number;
+ result: DashboardEntity[];
+ };
+ };
+};
+
+// Dashboard lineage response type
+export type DashboardLineage = {
+ dashboard: DashboardEntity & {
+ published: boolean;
+ };
+ upstream: {
+ charts: {
+ count: number;
+ result: ChartEntity[];
+ };
+ datasets: {
+ count: number;
+ result: DatasetEntity[];
+ };
+ databases: {
+ count: number;
+ result: DatabaseEntity[];
+ };
+ };
+ downstream: null;
+};
+
+/**
+ * Hook to fetch lineage data for a dataset
+ * @param idOrUuid Dataset ID or UUID
+ */
+export const useDatasetLineage = (idOrUuid: string | number) =>
+ useApiV1Resource<DatasetLineage>(`/api/v1/dataset/${idOrUuid}/lineage`);
+
+/**
+ * Hook to fetch lineage data for a chart
+ * @param idOrUuid Chart ID or UUID
+ */
+export const useChartLineage = (idOrUuid: string | number) =>
+ useApiV1Resource<ChartLineage>(`/api/v1/chart/${idOrUuid}/lineage`);
+
+/**
+ * Hook to fetch lineage data for a dashboard
+ * @param idOrSlug Dashboard ID or slug
+ */
+export const useDashboardLineage = (idOrSlug: string | number) =>
+ useApiV1Resource<DashboardLineage>(`/api/v1/dashboard/${idOrSlug}/lineage`);
Review Comment:
**Suggestion:** These hooks always build and request a URL even when the
identifier is empty, and callers currently pass `''` for non-selected entity
types; this triggers invalid requests like `/api/v1/chart//lineage` and
unnecessary error states/network traffic. Add a skip mechanism (for example
skip token support) or require callers to pass only valid IDs and avoid firing
inactive lineage requests. [api mismatch]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Extra lineage HTTP calls for invalid dataset/dashboard endpoints.
- ⚠️ Unnecessary 4xx errors clutter logs and monitoring metrics.
- ⚠️ Slightly slower lineage modal due to wasted network requests.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Open the chart list and click the "View Lineage" action for a chart,
which renders
`LineageModal` via `ChartCard` at
`superset-frontend/src/features/charts/ChartCard.tsx:10-30` (the menu item
with `key:
'lineage'` wraps a `<LineageModal entityType="chart" entityId={chart.id} ...
/>`).
2. When `LineageModal` mounts
(`superset-frontend/src/features/lineage/LineageModal.tsx:35-47`), it
unconditionally
calls all three hooks:
- `useDatasetLineage(entityType === 'dataset' ? entityId : '')`
- `useChartLineage(entityType === 'chart' ? entityId : '')`
- `useDashboardLineage(entityType === 'dashboard' ? entityId : '')`
so for a chart lineage view `datasetLineage` and `dashboardLineage` are
invoked with
`idOrUuid = ''`.
3. The hooks in
`superset-frontend/src/hooks/apiResources/lineage.ts:119-134` interpolate
the identifier directly into the endpoint string:
- `useDatasetLineage` calls
`useApiV1Resource<DatasetLineage>(\`/api/v1/dataset/${idOrUuid}/lineage\`)`
- `useDashboardLineage` calls
`useApiV1Resource<DashboardLineage>(\`/api/v1/dashboard/${idOrSlug}/lineage\`)`
which, with `idOrUuid`/`idOrSlug` equal to `''`, produce malformed URLs
`/api/v1/dataset//lineage` and `/api/v1/dashboard//lineage`.
4. `useApiV1Resource` delegates to `useApiResourceFullBody` in
`superset-frontend/src/hooks/apiResources/apiResources.ts:87-137`, which
unconditionally
constructs a `makeApi` GET request for the given `endpoint` and executes it;
this issues
real HTTP calls to `/api/v1/dataset//lineage` and
`/api/v1/dashboard//lineage`, leading to
unnecessary network traffic and error-state resources every time lineage is
opened for a
chart (and similarly for datasets/dashboards where the other two hooks
receive an empty
string).
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=1e99776911e5401aade90fd009f9dd40&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=1e99776911e5401aade90fd009f9dd40&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset-frontend/src/hooks/apiResources/lineage.ts
**Line:** 119:134
**Comment:**
*Api Mismatch: These hooks always build and request a URL even when the
identifier is empty, and callers currently pass `''` for non-selected entity
types; this triggers invalid requests like `/api/v1/chart//lineage` and
unnecessary error states/network traffic. Add a skip mechanism (for example
skip token support) or require callers to pass only valid IDs and avoid firing
inactive lineage requests.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=9048046fe1ac37f0e70f0468d9ee42abd04981dc05908ca8c10e9edbf75eb89a&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=9048046fe1ac37f0e70f0468d9ee42abd04981dc05908ca8c10e9edbf75eb89a&reaction=dislike'>👎</a>
##########
superset/charts/api.py:
##########
@@ -313,6 +314,103 @@ def get(self, id_or_uuid: str) -> Response:
except ChartNotFoundError:
return self.response_404()
+ @expose("/<id_or_uuid>/lineage", methods=("GET",))
+ @protect()
+ @safe
+ @statsd_metrics
+ @event_logger.log_this_with_context(
+ action=lambda self, *args, **kwargs:
f"{self.__class__.__name__}.lineage",
+ log_to_statsd=False,
+ )
+ def lineage(self, id_or_uuid: str) -> Response:
+ """Get lineage information for a chart.
+ ---
+ get:
+ summary: Get lineage information for a chart
+ description: >-
+ Returns upstream (dataset, database) and downstream (dashboards)
lineage
+ information for a chart
+ parameters:
+ - in: path
+ name: id_or_uuid
+ schema:
+ type: string
+ description: Either the id of the chart, or its uuid
+ responses:
+ 200:
+ description: Lineage information
+ content:
+ application/json:
+ schema:
+ $ref: "#/components/schemas/ChartLineageResponseSchema"
+ 401:
+ $ref: '#/components/responses/401'
+ 404:
+ $ref: '#/components/responses/404'
+ 500:
+ $ref: '#/components/responses/500'
+ """
+ try:
+ chart = ChartDAO.get_by_id_or_uuid(id_or_uuid)
+ except ChartNotFoundError:
+ return self.response_404()
+
+ chart_info = {
+ "id": chart.id,
+ "slice_name": chart.slice_name,
+ "viz_type": chart.viz_type,
+ }
+
+ # Get upstream (dataset and database) information
+ upstream: dict[str, Any] = {}
+ if dataset := chart.datasource:
+ upstream["dataset"] = {
+ "id": dataset.id,
+ "name": dataset.name,
+ "database_id": dataset.database_id,
+ "database_name": dataset.database.database_name
+ if dataset.database
+ else None,
+ "schema": dataset.schema,
+ "table_name": dataset.table_name,
+ }
+ if dataset.database:
+ upstream["database"] = {
+ "id": dataset.database.id,
+ "database_name": dataset.database.database_name,
+ "backend": dataset.database.backend,
+ }
+ else:
+ upstream["database"] = None
+ else:
+ upstream["dataset"] = None
+ upstream["database"] = None
+
+ # Get downstream (dashboards) information
+ dashboards = []
+ for dashboard in chart.dashboards:
+ dashboards.append(
+ {
+ "id": dashboard.id,
+ "title": dashboard.dashboard_title,
+ "slug": dashboard.slug,
+ }
+ )
Review Comment:
**Suggestion:** Downstream dashboards are returned without per-dashboard
access checks. Unlike the dataset lineage endpoint (which filters with security
checks), this can leak private dashboard titles/slugs to users who can access a
chart but not all dashboards containing it. Filter `chart.dashboards` with the
dashboard access guard before adding them to the response. [security]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ Chart lineage leaks titles/slugs of unauthorized dashboards.
- ⚠️ Exposes structure of restricted dashboards via lineage graph.
- ⚠️ Inconsistent security with other dashboard-filtered endpoints.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Note the chart lineage endpoint implementation in
`superset/charts/api.py:58-153`:
`ChartRestApi.lineage` loads a chart via
`ChartDAO.get_by_id_or_uuid(id_or_uuid)` and
then, in the downstream section at lines 130-139 (file lines ~389-397),
iterates over
`chart.dashboards`:
- initializes `dashboards = []`
- `for dashboard in chart.dashboards: dashboards.append({"id":
dashboard.id, "title":
dashboard.dashboard_title, "slug": dashboard.slug})`
with no call to `security_manager.can_access_dashboard` or any access
filter.
2. Compare this to dataset-related logic in `superset/datasets/api.py:1-11,
76-99`: when
building dashboard lists for dataset-related data, dashboards are filtered
with
`security_manager.can_access_dashboard(dashboard)` (see
`superset/superset/datasets/api.py:1-5` in the snippet before the dataset
`lineage`
method) and the guard itself is defined in
`superset/security/manager.py:940-17` as
`can_access_dashboard(self, dashboard)` using
`self.raise_for_access(dashboard=dashboard)`
to enforce dashboard permissions.
3. On the frontend, opening lineage for a chart (either from the chart card
menu or the
Explore additional actions menu) constructs a `LineageModal` with
`entityType="chart"` and
`entityId={chart.id}`, e.g.
`superset-frontend/src/features/charts/ChartCard.tsx:10-29`
and
`superset-frontend/src/explore/components/useExploreAdditionalActionsMenu/index.tsx`
(import of `LineageModal` at lines 71-76, followed by menus using
`MENU_KEYS.VIEW_LINEAGE`). `LineageModal` then calls
`useChartLineage(entityType ===
'chart' ? entityId : '')` in
`superset-frontend/src/features/lineage/LineageModal.tsx:35-44`, which
ultimately hits
`GET /api/v1/chart/<chart_id>/lineage`.
4. Because `ChartRestApi.lineage` returns *all* dashboards from
`chart.dashboards` without
filtering, any user who can access the chart but not all of its dashboards
(enforced via
`DashboardAccessFilter` in `superset/dashboards/api.py:143-145` and
`security_manager.can_access_dashboard` in
`superset/security/manager.py:940-17`) will
still receive the `id`, `title`, and `slug` of restricted dashboards in the
lineage
response; `LineageView` then renders these downstream dashboards in the
Sankey graph at
`superset-frontend/src/features/lineage/LineageView.tsx:256-260`, exposing
metadata for
dashboards they cannot otherwise open.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=459f3b33f3ef4bfc9b5626567af9dc87&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=459f3b33f3ef4bfc9b5626567af9dc87&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/charts/api.py
**Line:** 390:398
**Comment:**
*Security: Downstream dashboards are returned without per-dashboard
access checks. Unlike the dataset lineage endpoint (which filters with security
checks), this can leak private dashboard titles/slugs to users who can access a
chart but not all dashboards containing it. Filter `chart.dashboards` with the
dashboard access guard before adding them to the response.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=8f8d7d887ecfed434d22ebab684c3c2029c69669f8e92743cb430fb725e57262&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40912&comment_hash=8f8d7d887ecfed434d22ebab684c3c2029c69669f8e92743cb430fb725e57262&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]