GitHub user xuang7 added a comment to the discussion: Improve resumable upload: track completion at the batch/session level
Thanks for the suggestion. I think we should lean toward avoiding checksums as the default existence check on the client side. At the scale we are targeting, potentially TB-scale folders, the browser would need to read every byte of every file to compute the checksum before deciding whether to upload. That could be very slow and resource-heavy, and would partly reintroduce the cost that batch-level resume is trying to avoid. We could consider sampling, but that would still have accuracy limitations. I think a lightweight path + file size check is a better starting point. It is not a perfect guarantee, so instead of silently deciding for the user, we can surface the uploaded file records and let the user choose whether to skip those files or re-upload/restart them. Basically, we make the user aware of the existing files and let them make the final decision. Checksums could still be revisited later if needed. GitHub link: https://github.com/apache/texera/discussions/5744#discussioncomment-17424944 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
