On 27.02.2013, at 09:39, Shashank Gupta <[email protected]> wrote:

> Answers--
>> The problem of chunks is that they are
>> - not self-describing (what size?)
> When you parse multipart request, you should get chunk size. In single part 
> request, content-length gives chunk size.
>> - must all have the same length
> Not mandatory.  Why you think so?  In merge you append chunks in serial 
> manner till you find last chunk.

If you don't include the start byte index, then you have to know all the 
previously uploaded chunks and add their lengths together. This requires to 
always have those chunks available for subsequent chunks and is thus a 
restriction. Or you are forced to keep chunk size fixed so you can calculate 
the offset by chunk_size * chunk_number-1, but the last chunk is forced to have 
a different size (unless the file size is eactly dividable by the desired chunk 
size), so you'd need to also send along the desired_chunk_size with the last 
request.

Now if chunk requests include start and length as a solution to that, it's 
basically the same as byte ranges. And the chunk index becomes a repetitive and 
useless information. (Note that the total file length needs to be transferred 
in the first or all requests).

>> - introduce an arbitrary numbering scheme that you cannot break out of
> Explain it more.

See above.

>> - if problems arise for one chunk, the actual order might become very 
>> different, so "last chunk" is not a fixed thing
> I don't see it as an issue. "last chunk" not required to be fixed thing as 
> client explicitly mark last chunk request.
>> - as noted above, not in line with existing HTTP concepts for this (even if 
>> they currently only apply to GETs)
> Afaics, Aws s3 uses chunk number approaches and we believe AWS is doing good 
> in terms of scalability and concurrency.  One of the primary reason that we 
> thrust on chunknumber approach.

Please don't take AWS APIs as good examples for REST. They simply are not :-) 
This is separate from the fact that their backend is great, just the API part 
is not optimal.

>> Do you mean the file would be deferred or the other parameters? I'd say only 
>> the file, because you probably only sent the other ones with the first file 
>> snippet (and the client is free to chose when to send them along) and 
>> secondly making all params defer is going to be very complex.
> Sling cannot create nt:file without jcr:data because it will throw node type 
> constraint violations exception. So node creation  and other parameters  
> processing has to be done in last request. Client require to parameters in 
> last request  but free to send/not send in first and intermediate request.  
> Sling ignores other parameters send  in  first and intermediate requests.

Right, but I still think it's simpler to have a placeholder nt:file than to 
keep the request; or simply create the nt:file already, even if the binary is 
incomplete (as described for the streaming use case). This makes it really 
simple.

Delaying it for the last request is tremendously complex. And we don't know if 
that's really needed (from a use case perspective), so why not start simple.

>> - have a configurable size limit for those partial files (e.g. 50 GB)
>> - once the limit is hit, clean up expired files; if not enough, clean up the 
>> oldest files
> Imo, size check upon each chunk upload request is not required.  It adds 
> complexity. We already have scheduled job which can be configured to do 
> necessary clean up.

I would opt for the solution to write directly to the repository, than we spare 
all this complexity... In the end, JCR is the persistence layer, and that 
covers temporary resources as well. Persistence is none of Sling's business.

>> Then a protocol questions arises: What to respond when a client uploads the 
>> "final" byte range, but the previously uploaded ones are no longer present 
>> on the server?
> On first chunk upload,  sling sends "location"  header in response.  
> Subsequent upload request use this header as uploadId.
> Sling sends 404 when no resumable upload found.

Please no uploadId. They all adress the same resource just like multiple normal 
update requests to the sling post servlet.

And again, this is solved if we write to the JCR and nt:file directly.

Cheers,
Alex

Reply via email to