Re: [PR] feat: allow datasets with the same name across different users [texera]

via GitHub Mon, 08 Sep 2025 11:39:26 -0700


Copilot commented on code in PR #3715:
URL: https://github.com/apache/texera/pull/3715#discussion_r2331068509



##########
core/file-service/src/main/scala/edu/uci/ics/texera/service/resource/DatasetResource.scala:
##########
@@ -288,6 +315,7 @@ class DatasetResource {
           createdDataset.getDid,
           createdDataset.getOwnerUid,
           createdDataset.getName,
+          createdDataset.getRepositoryName,

Review Comment:
   The return statement for `DashboardDataset` constructor is missing a 
parameter. Based on the context, it should include 
`createdDataset.getRepositoryName` but the constructor expects 6 parameters: 
`did`, `ownerUid`, `name`, `repositoryName`, `isPublic`, `isDownloadable`, 
`description`, `creationTime`, `lastModifiedTime`. The current call only 
provides 9 parameters but seems to be missing proper parameter alignment.



##########
core/scripts/sql/updates/13.sql:
##########
@@ -0,0 +1,39 @@
+-- Licensed to the Apache Software Foundation (ASF) under one
+-- or more contributor license agreements.  See the NOTICE file
+-- distributed with this work for additional information
+-- regarding copyright ownership.  The ASF licenses this file
+-- to you under the Apache License, Version 2.0 (the
+-- "License"); you may not use this file except in compliance
+-- with the License.  You may obtain a copy of the License at
+--
+--   http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing,
+-- software distributed under the License is distributed on an
+-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+-- KIND, either express or implied.  See the License for the
+-- specific language governing permissions and limitations
+-- under the License.
+
+-- ============================================
+-- 1. Connect to the texera_db database
+-- ============================================
+\c texera_db
+
+SET search_path TO texera_db;
+
+-- ============================================
+-- 2. Update the table schema
+-- ============================================
+
+BEGIN;
+
+-- 1. Add new column repository_name to dataset table.
+ALTER TABLE dataset
+    ADD COLUMN repository_name varchar(128);
+
+-- 2. Copy the data from name column to repository_name column.
+UPDATE dataset
+SET repository_name = name;

Review Comment:
   The migration script copies dataset names directly to repository_name, but 
the new system expects repository names in format `dataset-{did}`. This 
migration will create inconsistency with the new repository naming scheme. The 
repository_name should be set to `CONCAT('dataset-', did)` instead of copying 
the name directly.
   ```suggestion
   SET repository_name = CONCAT('dataset-', did);
   ```



##########
core/file-service/src/main/scala/edu/uci/ics/texera/service/resource/DatasetResource.scala:
##########
@@ -1280,29 +1322,29 @@ class DatasetResource {
 
   private def resolveDatasetAndPath(
       encodedUrl: String,
-      datasetName: String,
+      repositoryName: String,
       commitHash: String,
       uid: Integer
   ): Either[Response, (String, String, String)] = {
     val decodedPathStr = URLDecoder.decode(encodedUrl, 
StandardCharsets.UTF_8.name())
 
-    (Option(datasetName), Option(commitHash)) match {
+    (Option(repositoryName), Option(commitHash)) match {
       case (Some(_), None) | (None, Some(_)) =>
         // Case 1: Only one parameter is provided (error case)
         Left(
           Response
             .status(Response.Status.BAD_REQUEST)
             .entity(
-              "Both datasetName and commitHash must be provided together, or 
neither should be provided."
+              "Both repositoryName and commitHash must be provided together, 
or neither should be provided."
             )
             .build()
         )
 
-      case (Some(dsName), Some(commit)) =>
-        // Case 2: datasetName and commitHash are provided, validate access
+      case (Some(repositoryName), Some(commit)) =>
+        // Case 2: repositoryName and commitHash are provided, validate access
         val response = withTransaction(context) { ctx =>
           val datasetDao = new DatasetDao(ctx.configuration())
-          val datasets = datasetDao.fetchByName(dsName).asScala.toList
+          val datasets = 
datasetDao.fetchByRepositoryName(repositoryName).asScala.toList

Review Comment:
   The code attempts to call `fetchByRepositoryName()` on `DatasetDao`, but 
this method may not exist. The existing code pattern suggests `fetchByName()` 
was used previously for dataset names. You need to ensure 
`fetchByRepositoryName()` method is implemented in `DatasetDao` or use an 
alternative query approach.



##########
core/file-service/src/main/scala/edu/uci/ics/texera/service/resource/DatasetResource.scala:
##########
@@ -1311,17 +1353,18 @@ class DatasetResource {
           // Standard read access check only - download restrictions handled 
per endpoint
           // Non-download operations (viewing) should work for all public 
datasets
 
-          (dsName, commit, decodedPathStr)
+          (repositoryName, commit, decodedPathStr)
         }
         Right(response)
 
       case (None, None) =>
-        // Case 3: Neither datasetName nor commitHash are provided, resolve 
normally
+        // Case 3: Neither repositoryName nor commitHash are provided, resolve 
normally
         val response = withTransaction(context) { ctx =>
           val fileUri = FileResolver.resolve(decodedPathStr)
           val document = 
DocumentFactory.openReadonlyDocument(fileUri).asInstanceOf[OnDataset]
           val datasetDao = new DatasetDao(ctx.configuration())
-          val datasets = 
datasetDao.fetchByName(document.getDatasetName()).asScala.toList
+          val datasets =
+            
datasetDao.fetchByRepositoryName(document.getRepositoryName()).asScala.toList

Review Comment:
   Same issue as above - `fetchByRepositoryName()` method may not exist in 
`DatasetDao`. This could cause runtime errors when this code path is executed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat: allow datasets with the same name across different users [texera]

Reply via email to