mengw15 commented on code in PR #5055: URL: https://github.com/apache/texera/pull/5055#discussion_r3297064970
########## sql/updates/23.sql: ########## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +-- ============================================ +-- 1. Connect to the texera_db database +-- ============================================ +SET search_path TO texera_db; + +-- ============================================ +-- 2. Delete tables if they already exist +-- ============================================ + +BEGIN; + +DROP TABLE IF EXISTS notebook CASCADE; +DROP TABLE IF EXISTS workflow_notebook_mapping CASCADE; Review Comment: Re-run safety: the `DROP TABLE IF EXISTS notebook CASCADE` + `DROP TABLE IF EXISTS workflow_notebook_mapping CASCADE` at the top of the migration silently destroys data if this file is ever applied a second time (manual `psql -f`, a DB reset that forgets the Liquibase `DATABASECHANGELOG`, or a future edit that changes the MD5 and prompts a re-run). The other update files (`21.sql`, `22.sql`, etc.) are pure `ALTER` / `CREATE` and don't carry this footgun. Could we drop the `DROP`s and use `CREATE TABLE IF NOT EXISTS` for the two new tables instead? ########## sql/texera_ddl.sql: ########## @@ -435,6 +437,28 @@ CREATE TABLE IF NOT EXISTS computing_unit_user_access FOREIGN KEY (uid) REFERENCES "user"(uid) ON DELETE CASCADE ); +-- notebook table +CREATE TABLE IF NOT EXISTS notebook +( + nid SERIAL NOT NULL PRIMARY KEY, + wid INT NOT NULL, + notebook JSONB NOT NULL, + FOREIGN KEY (wid) REFERENCES workflow(wid) ON DELETE CASCADE +); Review Comment: Cardinality question: nothing in the schema prevents inserting two `notebook` rows for the same `wid`, but the parent issue (#4301, demo #5 "when the user reopens a workflow that was generated from a notebook, it will also reopen the notebook") reads like a 1:1 relationship. If a workflow is supposed to have at most one notebook, would a `UNIQUE (wid)` on `notebook` (or making `wid` the PK in place of `nid`) be safer than relying on application code to enforce it? ########## sql/texera_ddl.sql: ########## @@ -435,6 +437,28 @@ CREATE TABLE IF NOT EXISTS computing_unit_user_access FOREIGN KEY (uid) REFERENCES "user"(uid) ON DELETE CASCADE ); +-- notebook table +CREATE TABLE IF NOT EXISTS notebook +( + nid SERIAL NOT NULL PRIMARY KEY, + wid INT NOT NULL, + notebook JSONB NOT NULL, + FOREIGN KEY (wid) REFERENCES workflow(wid) ON DELETE CASCADE +); Review Comment: The main read pattern for `notebook` seems to be "given a workflow, find its notebook" (e.g., reopening a workflow → load its notebook). Postgres doesn't auto-create an index on FK columns, so this lookup would currently sequential-scan the table. Worth a `CREATE INDEX ON notebook(wid)`? (If a `UNIQUE(wid)` is added per the previous comment, that already creates an index and this is moot.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
