Re: [PR] feat(python-notebook-migration): add database tables for Notebook Migration tool [texera]

via GitHub Mon, 25 May 2026 01:46:25 -0700


mengw15 commented on code in PR #5055:
URL: https://github.com/apache/texera/pull/5055#discussion_r3297064970



##########
sql/updates/23.sql:
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+-- ============================================
+-- 1. Connect to the texera_db database
+-- ============================================
+SET search_path TO texera_db;
+
+-- ============================================
+-- 2. Delete tables if they already exist
+-- ============================================
+
+BEGIN;
+
+DROP TABLE IF EXISTS notebook CASCADE;
+DROP TABLE IF EXISTS workflow_notebook_mapping CASCADE;

Review Comment:
   Re-run safety: the `DROP TABLE IF EXISTS notebook CASCADE` + `DROP TABLE IF 
EXISTS workflow_notebook_mapping CASCADE` at the top of the migration silently 
destroys data if this file is ever applied a second time (manual `psql -f`, a 
DB reset that forgets the Liquibase `DATABASECHANGELOG`, or a future edit that 
changes the MD5 and prompts a re-run). The other update files (`21.sql`, 
`22.sql`, etc.) are pure `ALTER` / `CREATE` and don't carry this footgun. Could 
we drop the `DROP`s and use `CREATE TABLE IF NOT EXISTS` for the two new tables 
instead?



##########
sql/texera_ddl.sql:
##########
@@ -435,6 +437,28 @@ CREATE TABLE IF NOT EXISTS computing_unit_user_access
     FOREIGN KEY (uid) REFERENCES "user"(uid) ON DELETE CASCADE
 );
 
+-- notebook table
+CREATE TABLE IF NOT EXISTS notebook
+(
+    nid         SERIAL  NOT NULL PRIMARY KEY,
+    wid         INT     NOT NULL,
+    notebook    JSONB   NOT NULL,
+    FOREIGN KEY (wid) REFERENCES workflow(wid) ON DELETE CASCADE
+);

Review Comment:
   Cardinality question: nothing in the schema prevents inserting two 
`notebook` rows for the same `wid`, but the parent issue (#4301, demo #5 "when 
the user reopens a workflow that was generated from a notebook, it will also 
reopen the notebook") reads like a 1:1 relationship. If a workflow is supposed 
to have at most one notebook, would a `UNIQUE (wid)` on `notebook` (or making 
`wid` the PK in place of `nid`) be safer than relying on application code to 
enforce it?



##########
sql/texera_ddl.sql:
##########
@@ -435,6 +437,28 @@ CREATE TABLE IF NOT EXISTS computing_unit_user_access
     FOREIGN KEY (uid) REFERENCES "user"(uid) ON DELETE CASCADE
 );
 
+-- notebook table
+CREATE TABLE IF NOT EXISTS notebook
+(
+    nid         SERIAL  NOT NULL PRIMARY KEY,
+    wid         INT     NOT NULL,
+    notebook    JSONB   NOT NULL,
+    FOREIGN KEY (wid) REFERENCES workflow(wid) ON DELETE CASCADE
+);

Review Comment:
   The main read pattern for `notebook` seems to be "given a workflow, find its 
notebook" (e.g., reopening a workflow → load its notebook). Postgres doesn't 
auto-create an index on FK columns, so this lookup would currently 
sequential-scan the table. Worth a `CREATE INDEX ON notebook(wid)`? (If a 
`UNIQUE(wid)` is added per the previous comment, that already creates an index 
and this is moot.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(python-notebook-migration): add database tables for Notebook Migration tool [texera]

Reply via email to