carloea2 opened a new issue, #5938:
URL: https://github.com/apache/texera/issues/5938

   ### Task Summary
   
   ### Task Summary
   
   Improve dataset upload retry behavior so batch uploads distinguish between 
incomplete multipart uploads and files that already exist in the dataset.
   
   | Case | Expected behavior |
   | --- | --- |
   | Active multipart upload session exists for the same path | Prompt the user 
to resume or restart the incomplete upload. |
   | A file with the same path and size already exists in committed or staged 
dataset files | Prompt the user to upload again or skip the matching file. |
   
   The completed-file prompt should use cautious wording because matching by 
path and size does not prove byte-for-byte equality.
   
   Implementation should include:
   
   - A backend dataset-scoped check for candidate upload paths and sizes.
   - Frontend logic that checks active multipart sessions first, then checks 
existing matching files.
   - Support for mixed retry batches where one file resumes and another file 
can be skipped.
   - Tests for multipart resume behavior, completed-file skip behavior, backend 
committed/staged matches, and invalid or unauthorized requests.
   
   Related discussion: #5744  
   Related PR: #5929
   
   ### Task Type
   
   - [ ] Refactor / Cleanup
   - [ ] DevOps / Deployment / CI
   - [x] Testing / QA
   - [ ] Documentation
   - [ ] Performance
   - [x] Other
   
   ### Task Type
   
   - [ ] Refactor / Cleanup
   - [ ] DevOps / Deployment / CI
   - [ ] Testing / QA
   - [ ] Documentation
   - [ ] Performance
   - [ ] Other


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to