Inline with [SG] initial. 

-----Original Message-----
From: Alexander Klimetschek [mailto:[email protected]] 
Sent: 25 February 2013 18:36
To: [email protected]
Subject: Re: [POST] Servlet resolution for non existing resource

On 25.02.2013, at 10:38, Shashank Gupta <[email protected]> wrote:

> Here are the salient points wrt resumable upload implementation/integration 
> in SlingPostServlet[1].  
> 
> 1.    Resumeable upload is supported in "modify"operation ( ie. In default 
> operation) . No new operation introduced for it.  

Yes.

> 2.    Request parameter ":chunkNumber" distinguish between partial and 
> 'single shot upload'. Better than "content range" parameter approach, as it 
> avoids ambiguity in overlapping ranges like 100-199, 100-299 etc.

What's wrong with overlapping ranges? And client errors are to be expected, for 
example you will always have the possible case that a client never uploads 
chunk X or range A-B. And either some garbage collection kicks in (if stored in 
the file system, but adds complexity) or it's just left in the repository (i.e. 
with a "sling:partialFile" node type or simply a normal nt:file that is 
constantly updated).

The newest upload wins, i.e. existing ranges would always be overwritten.

This would make it simpler to switch to a Range-header based upload if it might 
be standardized around HTTP in the future.

[SG] Range based upload is not a standard and declined by Julian. Introduction 
of new type "sling:partialFile"  complicate things. How does it solve "modify 
binary" use case.  I will not take this approach unless there is consensus for 
this approach.  

> 3.    Request parameter ":lastChunk=true" distinguish between intermediate 
> and last upload chunk.

What if the second chunk failed and you want to repeat it? While all the others 
including "lastChunk" where successfull. I think chunks where the server 
doesn't know the real byte coordinates for every single request won't work. You 
need to specify exactly what is uploaded - and byte ranges + full length are 
IMHO more generic than to say chunk X out of N, size of chunk = M.

[SG] I think we are not discussing about parallel chunk upload here and thus 
invalidates your point.  The spec and impl is for simple, resumeable and serial 
chunk upload. 
Query chunk upload provides chunk number and bytes uploaded till failure point. 
 Client will resume from next chunknumber and byte offset. 

> 4.    Chunk Storage:
>       *       In place append: will not work in modify use case.  In create,  
> If chunk size is x,  Square(x) space consumed in datastore. [x +2x  + 3x+ ... 
>  = O(square(x))
>       *       Chunks saved in JCR. If chunk size is x,  2x space consumed in 
> datastore. Chunks stored at temp location in 
> /var/chunks/<uploadid>/<chunkNumber>

Right, good point. But that's really an issue of the data store, but not of the 
sling/jcr API. Ideally for this case the datastore would allow update of a 
binary (initially @FileLength filled with zeros and first range), while 
adapting the hash along the way (but moving the actual file). It would require 
real-time reference tracking in the data store though...

[SG] too complex.  We have to live with current datastore implementation at 
least for the time being.  

> 5.    Chunk upload response:
>       *       First/Intermediate chunks: 200 OK. Response body will *not* 
> contain list of changes . Location header contain temp location 
> "/var/chunks/<uploadid>". Client can use it to retrieve upload information 
> and hole information.
>       *       Last chunk: 201/200 in case of creation/modification.  Response 
> body contains list of changes in json or html format. 

Response body would be the normal sling post servlet response (html or json 
IIRC) and would always be 200 if successful.
[SG] ok.

> 6.    Chunk upload processing
>       *       First/Intermediate chunks: Only save chunk is saved in jcr. 
> Ignores all upload semantics (@TypeHint, etc)  and request parameter. 

This would apply to files in the form request only, so that is already handled 
specifically in the sling post servlet (i.e. updating a binary property). If 
the request contains other sling post servlet params, they would be processed 
normally.
[SG] Yes they would be *but* it would be deferred till sling receives last 
chunk. 

>       *       Last chunk: Stiches all chunks. Process all upload semantics 
> and request parameters and creates jcr node structure. 

The question (also for 2.+3.) is: does Sling have to know when everything was 
uploaded?
- if it's temporarily stored on the file system, then yes (to move it to jcr)
- if it's stored in a special content structure in jcr, then yes (to convert it 
to a nt:file)
- if it's stored in nt:file in jcr, then no

[SG]  you break modify binary use case if you append in place.  The binary 
would be corrupted unless all chunks have arrived.

The last two would probably require some investigation in data store 
enhancements to avoid wasting space.
[SG] afaik, additive "CUD"  is very fundamental to tarpersistence and datastore 
so we have to live with it at least till  oak. 

Cheers,
Alex

Reply via email to