GitHub user xuang7 added a comment to the discussion: Improve resumable upload: 
track completion at the batch/session level

Thanks for the suggestion. I think we should lean toward avoiding checksums as 
the default existence check on the client side. At the scale we are targeting, 
potentially TB-scale folders, the browser would need to read every byte of 
every file to compute the checksum before deciding whether to upload. That 
could be very slow and resource-heavy, and would partly reintroduce the cost 
that batch-level resume is trying to avoid. We could consider sampling, but 
that would still have accuracy limitations.

I think a lightweight path + file size check is a better starting point. It is 
not a perfect guarantee, so instead of silently deciding for the user, we can 
surface the uploaded file records and let the user choose whether to skip those 
files or re-upload/restart them. Basically, we make the user aware of the 
existing files and let them make the final decision. Checksums could still be 
revisited later if needed.

GitHub link: 
https://github.com/apache/texera/discussions/5744#discussioncomment-17424944

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to