Re: [PR] feat(db): add onboarding workflows structure [superset]

via GitHub Wed, 04 Feb 2026 13:43:54 -0800


codeant-ai-for-open-source[bot] commented on code in PR #37680:
URL: https://github.com/apache/superset/pull/37680#discussion_r2766108878



##########
superset/migrations/versions/2026-02-04_17-02_a55bbb4da9a6_add_create_dashboard_with_none_existing_.py:
##########
@@ -0,0 +1,79 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Add oboarding workflow for creating a dashboard with none existing chart
+
+Revision ID: a55bbb4da9a6
+Revises: 91b151e8dac1
+Create Date: 2026-02-04 17:02:54.527420
+
+"""
+
+# revision identifiers, used by Alembic.
+revision = 'a55bbb4da9a6'
+down_revision = '91b151e8dac1'
+
+from alembic import op
+import sqlalchemy as sa
+
+onboarding_workflow_table = sa.Table(
+    "onboarding_workflow",
+    sa.MetaData(),
+    sa.Column("id", sa.Integer, primary_key=True),
+    sa.Column("name", sa.String(100), unique=True),
+    sa.Column("description", sa.String(255)),
+)
+
+user_onboarding_workflow_table = sa.Table(
+    "user_onboarding_workflow",
+    sa.MetaData(),
+    sa.Column("user_id", sa.Integer, sa.ForeignKey("ab_user.id")),
+    sa.Column("workflow_id", sa.Integer, 
sa.ForeignKey("onboarding_workflow.id")),
+    sa.Column("visited", sa.Boolean, default=False),
+    sa.Column("visited_times", sa.Integer, default=0),
+)
+
+
+def upgrade():
+    bind = op.get_bind()
+
+    # Insert the new onboarding workflow
+    insert_stmt = onboarding_workflow_table.insert().values(
+        name="CREATE_DASHBOARD_WITH_NO_EXISTING_CHART",
+        description="Onboarding workflow for creating a dashboard with none 
existing chart"
+    )
+
+    result = bind.execute(insert_stmt)
+    workflow_id = result.inserted_primary_key[0]
+
+    # Insert a user_onboarding_workflow row for each existing user
+    # All active users should have this workflow in their onboarding list,
+    # but Gamma and Public users will not mark this workflow as visited since
+    # they won't have write permissions to the API.
+    insert_user_workflows = sa.text(
+        """
+        INSERT INTO user_onboarding_workflow (user_id, workflow_id, visited, 
visited_times)
+        SELECT u.id, :workflow_id, false, 0
+        FROM ab_user u
+        WHERE u.active = true
+        """
+    )
+
+    bind.execute(insert_user_workflows, {"workflow_id": workflow_id})

Review Comment:
   **Suggestion:** The raw SQL uses literal boolean values `false` and `true`, 
which are not valid boolean literals on SQLite and will cause this migration to 
fail when running tests or using SQLite, so the booleans should be passed as 
bound parameters that SQLAlchemy can adapt per backend. [type error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Migration fails on SQLite local or test environments.
   - ⚠️ Developer onboarding and unit tests blocked.
   - ⚠️ Postgres production unaffected by this syntax.
   ```
   </details>
   
   ```suggestion
           SELECT u.id, :workflow_id, :visited, :visited_times
           FROM ab_user u
           WHERE u.active = :active
           """
       )
   
       bind.execute(
           insert_user_workflows,
           {
               "workflow_id": workflow_id,
               "visited": False,
               "visited_times": 0,
               "active": True,
           },
       )
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Run the migration against a SQLite-backed database (common in local 
developer setups or
   some test environments). The migration file is
   
`superset/migrations/versions/2026-02-04_17-02_a55bbb4da9a6_add_create_dashboard_with_none_existing_.py`.
   
   2. During upgrade(), execution reaches the raw SQL block defined at lines 
66-74 where the
   SQL text contains `false` and `true` literals.
   
   3. bind.execute(insert_user_workflows, {"workflow_id": workflow_id}) is 
invoked at line
   75. SQLite does not accept the bare `false`/`true` tokens in SQL the same 
way Postgres
   does; the backend will raise a SQL/Operational error or treat the tokens 
unexpectedly when
   the DB driver executes the raw SQL.
   
   4. Observe migration failure in local dev or CI that uses SQLite: the 
upgrade aborts with
   a SQL error at `...:75`, blocking tests or developer workflows. Passing 
booleans as bound
   parameters (as in the improved code) lets SQLAlchemy adapt them to the 
backend.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** 
superset/migrations/versions/2026-02-04_17-02_a55bbb4da9a6_add_create_dashboard_with_none_existing_.py
   **Line:** 69:75
   **Comment:**
        *Type Error: The raw SQL uses literal boolean values `false` and 
`true`, which are not valid boolean literals on SQLite and will cause this 
migration to fail when running tests or using SQLite, so the booleans should be 
passed as bound parameters that SQLAlchemy can adapt per backend.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/models/user_onboarding_workflow.py:
##########
@@ -0,0 +1,46 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from flask_appbuilder import Model
+from sqlalchemy import Column, ForeignKey, Integer, Boolean
+from sqlalchemy.orm import relationship
+
+from superset import security_manager
+from superset.models.helpers import AuditMixinNullable
+
+
+class UserOnboardingWorkflow(Model, AuditMixinNullable):
+    """
+    Stores per-user state for onboarding workflows.
+
+    This model tracks a user's interaction with a specific onboarding workflow,
+    including whether the workflow has been visited and how many times it has
+    been completed or re-run. It enables Superset to determine when onboarding
+    experiences should be shown automatically and allows users to re-trigger
+    workflows as needed.
+
+    This table represents user-specific state only and should not contain any
+    workflow definition or UI configuration details.
+    """
+    __tablename__ = "user_onboarding_workflow"
+    user_id = Column(Integer, ForeignKey("ab_user.id"))

Review Comment:
   **Suggestion:** The ORM model does not define any primary key column, which 
will cause SQLAlchemy/Flask AppBuilder to fail mapper configuration at import 
time (e.g., "Mapper ... does not have any primary key columns") and prevent 
normal CRUD operations on this table; add a primary key column to the model. 
[logic error]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Superset webserver fails to import models on startup.
   - ❌ DB migrations involving this model fail.
   - ⚠️ Onboarding workflow CRUD unusable until fixed.
   ```
   </details>
   
   ```suggestion
       id = Column(Integer, primary_key=True)
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Start the Superset process that imports models so SQLAlchemy/FAB maps ORM 
classes. The
   class mapping in `superset/models/user_onboarding_workflow.py` begins at 
`class
   UserOnboardingWorkflow` (file lines 25-46) and defines columns at lines 
38-45.
   
   2. During module import (application startup), SQLAlchemy inspects
   `UserOnboardingWorkflow` mapping (lines 25 and 38-45). Because no primary 
key Column is
   declared, SQLAlchemy's mapper validation runs against this class.
   
   3. SQLAlchemy raises a Mapper configuration error (e.g., "Mapper 
'UserOnboardingWorkflow'
   has no primary key") while importing 
`superset/models/user_onboarding_workflow.py` (around
   the same lines), preventing the app from finishing initialization or using 
the model.
   
   4. As evidence: the only column-like attributes in this file are `user_id`, 
`workflow_id`,
   `visited`, `visited_times` (lines 39-40 and 44-45) and no `primary_key=True` 
is present;
   adding `id = Column(Integer, primary_key=True)` fixes the 
missing-primary-key mapper
   failure.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/models/user_onboarding_workflow.py
   **Line:** 39:39
   **Comment:**
        *Logic Error: The ORM model does not define any primary key column, 
which will cause SQLAlchemy/Flask AppBuilder to fail mapper configuration at 
import time (e.g., "Mapper ... does not have any primary key columns") and 
prevent normal CRUD operations on this table; add a primary key column to the 
model.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/migrations/versions/2026-02-04_17-02_a55bbb4da9a6_add_create_dashboard_with_none_existing_.py:
##########
@@ -0,0 +1,79 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Add oboarding workflow for creating a dashboard with none existing chart
+
+Revision ID: a55bbb4da9a6
+Revises: 91b151e8dac1
+Create Date: 2026-02-04 17:02:54.527420
+
+"""
+
+# revision identifiers, used by Alembic.
+revision = 'a55bbb4da9a6'
+down_revision = '91b151e8dac1'
+
+from alembic import op
+import sqlalchemy as sa
+
+onboarding_workflow_table = sa.Table(
+    "onboarding_workflow",
+    sa.MetaData(),
+    sa.Column("id", sa.Integer, primary_key=True),
+    sa.Column("name", sa.String(100), unique=True),
+    sa.Column("description", sa.String(255)),
+)
+
+user_onboarding_workflow_table = sa.Table(
+    "user_onboarding_workflow",
+    sa.MetaData(),
+    sa.Column("user_id", sa.Integer, sa.ForeignKey("ab_user.id")),
+    sa.Column("workflow_id", sa.Integer, 
sa.ForeignKey("onboarding_workflow.id")),
+    sa.Column("visited", sa.Boolean, default=False),
+    sa.Column("visited_times", sa.Integer, default=0),
+)
+
+
+def upgrade():
+    bind = op.get_bind()
+
+    # Insert the new onboarding workflow
+    insert_stmt = onboarding_workflow_table.insert().values(
+        name="CREATE_DASHBOARD_WITH_NO_EXISTING_CHART",
+        description="Onboarding workflow for creating a dashboard with none 
existing chart"
+    )
+
+    result = bind.execute(insert_stmt)
+    workflow_id = result.inserted_primary_key[0]
+
+    # Insert a user_onboarding_workflow row for each existing user
+    # All active users should have this workflow in their onboarding list,
+    # but Gamma and Public users will not mark this workflow as visited since
+    # they won't have write permissions to the API.
+    insert_user_workflows = sa.text(
+        """
+        INSERT INTO user_onboarding_workflow (user_id, workflow_id, visited, 
visited_times)
+        SELECT u.id, :workflow_id, false, 0
+        FROM ab_user u
+        WHERE u.active = true
+        """
+    )
+
+    bind.execute(insert_user_workflows, {"workflow_id": workflow_id})
+
+
+def downgrade():
+    pass

Review Comment:
   **Suggestion:** The downgrade function is a no-op, so rolling back this 
migration will leave the inserted onboarding workflow and all related 
user-onboarding rows in place, violating the expectation that migrations are 
reversible and potentially causing duplicate or stale data if a similar 
workflow is re-added later. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Alembic migration rollback leaves stale onboarding rows.
   - ⚠️ Re-applying migration may trigger unique constraint error.
   - ⚠️ Local/CI rollback workflow/tests may be blocked.
   ```
   </details>
   
   ```suggestion
       bind = op.get_bind()
   
       # Find the workflow ID created in upgrade
       workflow_id = bind.execute(
           sa.select(onboarding_workflow_table.c.id).where(
               onboarding_workflow_table.c.name
               == "CREATE_DASHBOARD_WITH_NO_EXISTING_CHART"
           )
       ).scalar()
   
       if workflow_id is not None:
           # Remove user onboarding workflow rows for this workflow
           bind.execute(
               user_onboarding_workflow_table.delete().where(
                   user_onboarding_workflow_table.c.workflow_id == workflow_id
               )
           )
           # Remove the onboarding workflow itself
           bind.execute(
               onboarding_workflow_table.delete().where(
                   onboarding_workflow_table.c.id == workflow_id
               )
           )
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Apply the migration via Alembic: run `alembic upgrade head`. This 
executes the
   upgrade() in file
   
`superset/migrations/versions/2026-02-04_17-02_a55bbb4da9a6_add_create_dashboard_with_none_existing_.py`
   starting at line 50; the migration inserts the workflow at lines 54-60 and 
inserts
   per-user rows at lines 66-75.
   
   2. Attempt to roll back the migration with `alembic downgrade 91b151e8dac1`. 
Alembic calls
   downgrade() defined at 
`..._add_create_dashboard_with_none_existing_.py:78-79`. Because
   downgrade() is a no-op (`pass`), the workflow row and 
user_onboarding_workflow rows
   inserted in step 1 remain in the database.
   
   3. Re-run the migration (another `alembic upgrade head`) which hits the 
insert at
   `...py:54-60` again. Because the previously inserted workflow still exists, 
the insert may
   violate the workflow name uniqueness or otherwise lead to unexpected state 
(upgrade fails
   or produces duplicate-stale state depending on DB constraints).
   
   4. Observe irreversible state: database still contains onboarding entries 
despite
   downgrade; test/rollback workflows and CI that expect reversible migrations 
will fail.
   This demonstrates the downgrade being a real, reproducible problem because 
the file's
   upgrade creates persistent rows but downgrade does not remove them.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** 
superset/migrations/versions/2026-02-04_17-02_a55bbb4da9a6_add_create_dashboard_with_none_existing_.py
   **Line:** 79:79
   **Comment:**
        *Logic Error: The downgrade function is a no-op, so rolling back this 
migration will leave the inserted onboarding workflow and all related 
user-onboarding rows in place, violating the expectation that migrations are 
reversible and potentially causing duplicate or stale data if a similar 
workflow is re-added later.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/migrations/versions/2026-02-04_16-32_91b151e8dac1_add_onboarding_workflows.py:
##########
@@ -0,0 +1,57 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Add onboarding workflows tables
+
+Revision ID: 91b151e8dac1
+Revises: 9787190b3d89
+Create Date: 2026-02-04 16:32:25.570254
+
+"""
+
+# revision identifiers, used by Alembic.
+revision = '91b151e8dac1'
+down_revision = '9787190b3d89'
+
+from alembic import op
+import sqlalchemy as sa
+
+def upgrade():
+    op.create_table(
+        "onboarding_workflow",
+        sa.Column("created_on", sa.DateTime(), nullable=True),
+        sa.Column("changed_on", sa.DateTime(), nullable=True),
+        sa.Column("id", sa.Integer, primary_key=True),
+        sa.Column("name", sa.String(100), unique=True),
+        sa.Column("description", sa.String(255)),
+    )
+
+    op.create_table(
+        "user_onboarding_workflow",
+        sa.Column("created_on", sa.DateTime(), nullable=True),
+        sa.Column("changed_on", sa.DateTime(), nullable=True),
+        sa.Column("user_id", sa.Integer()),
+        sa.Column("workflow_id", sa.Integer()),
+        sa.Column("visited", sa.Boolean, default=False),
+        sa.Column("visited_times", sa.Integer, default=0),
+        sa.ForeignKeyConstraint(["user_id"], ["ab_user.id"]),

Review Comment:
   **Suggestion:** The mapping table is created without any primary key, which 
allows duplicate rows for the same user/workflow pair and can break ORM 
assumptions that every row is uniquely identifiable, making updates and deletes 
error-prone. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Onboarding workflow tracking may store duplicate rows.
   - ⚠️ ORM updates/deletes may target multiple rows unexpectedly.
   ```
   </details>
   
   ```suggestion
           sa.PrimaryKeyConstraint("user_id", "workflow_id"),
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Checkout the branch containing the migration file
   
`superset/migrations/versions/2026-02-04_16-32_91b151e8dac1_add_onboarding_workflows.py`
   and open the op.create_table for `user_onboarding_workflow` (definition 
begins at file
   line 42 in the PR hunk).
   
   2. Apply migrations (this migration uses the code at lines 42-52) by running 
the project's
   normal upgrade step (for example, `superset db upgrade` or the alembic 
equivalent). The
   `user_onboarding_workflow` table will be created without any primary key 
because the code
   at lines 42-52 does not declare one.
   
   3. Using a SQL client (psql / mysql client) or SQLAlchemy engine, run two 
identical
   inserts into the new table, e.g.:
   
      - `INSERT INTO user_onboarding_workflow (user_id, workflow_id, visited, 
visited_times)
      VALUES (1, 1, true, 1);`
   
      - Run the same INSERT again.
   
      Because there is no primary key or unique constraint on (user_id, 
workflow_id), both
      INSERTs succeed and two duplicate rows appear.
   
   4. Verify duplicates with `SELECT count(*) FROM user_onboarding_workflow 
WHERE user_id=1
   AND workflow_id=1;` which returns 2. This demonstrates the concrete problem 
caused by
   missing PK at the migration code at lines 42-52.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** 
superset/migrations/versions/2026-02-04_16-32_91b151e8dac1_add_onboarding_workflows.py
   **Line:** 50:50
   **Comment:**
        *Logic Error: The mapping table is created without any primary key, 
which allows duplicate rows for the same user/workflow pair and can break ORM 
assumptions that every row is uniquely identifiable, making updates and deletes 
error-prone.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat(db): add onboarding workflows structure [superset]

Reply via email to