Inline with [SG] initial. -----Original Message----- From: Alexander Klimetschek [mailto:[email protected]] Sent: 25 February 2013 18:36 To: [email protected] Subject: Re: [POST] Servlet resolution for non existing resource
On 25.02.2013, at 10:38, Shashank Gupta <[email protected]> wrote: > Here are the salient points wrt resumable upload implementation/integration > in SlingPostServlet[1]. > > 1. Resumeable upload is supported in "modify"operation ( ie. In default > operation) . No new operation introduced for it. Yes. > 2. Request parameter ":chunkNumber" distinguish between partial and > 'single shot upload'. Better than "content range" parameter approach, as it > avoids ambiguity in overlapping ranges like 100-199, 100-299 etc. What's wrong with overlapping ranges? And client errors are to be expected, for example you will always have the possible case that a client never uploads chunk X or range A-B. And either some garbage collection kicks in (if stored in the file system, but adds complexity) or it's just left in the repository (i.e. with a "sling:partialFile" node type or simply a normal nt:file that is constantly updated). The newest upload wins, i.e. existing ranges would always be overwritten. This would make it simpler to switch to a Range-header based upload if it might be standardized around HTTP in the future. [SG] Range based upload is not a standard and declined by Julian. Introduction of new type "sling:partialFile" complicate things. How does it solve "modify binary" use case. I will not take this approach unless there is consensus for this approach. > 3. Request parameter ":lastChunk=true" distinguish between intermediate > and last upload chunk. What if the second chunk failed and you want to repeat it? While all the others including "lastChunk" where successfull. I think chunks where the server doesn't know the real byte coordinates for every single request won't work. You need to specify exactly what is uploaded - and byte ranges + full length are IMHO more generic than to say chunk X out of N, size of chunk = M. [SG] I think we are not discussing about parallel chunk upload here and thus invalidates your point. The spec and impl is for simple, resumeable and serial chunk upload. Query chunk upload provides chunk number and bytes uploaded till failure point. Client will resume from next chunknumber and byte offset. > 4. Chunk Storage: > * In place append: will not work in modify use case. In create, > If chunk size is x, Square(x) space consumed in datastore. [x +2x + 3x+ ... > = O(square(x)) > * Chunks saved in JCR. If chunk size is x, 2x space consumed in > datastore. Chunks stored at temp location in > /var/chunks/<uploadid>/<chunkNumber> Right, good point. But that's really an issue of the data store, but not of the sling/jcr API. Ideally for this case the datastore would allow update of a binary (initially @FileLength filled with zeros and first range), while adapting the hash along the way (but moving the actual file). It would require real-time reference tracking in the data store though... [SG] too complex. We have to live with current datastore implementation at least for the time being. > 5. Chunk upload response: > * First/Intermediate chunks: 200 OK. Response body will *not* > contain list of changes . Location header contain temp location > "/var/chunks/<uploadid>". Client can use it to retrieve upload information > and hole information. > * Last chunk: 201/200 in case of creation/modification. Response > body contains list of changes in json or html format. Response body would be the normal sling post servlet response (html or json IIRC) and would always be 200 if successful. [SG] ok. > 6. Chunk upload processing > * First/Intermediate chunks: Only save chunk is saved in jcr. > Ignores all upload semantics (@TypeHint, etc) and request parameter. This would apply to files in the form request only, so that is already handled specifically in the sling post servlet (i.e. updating a binary property). If the request contains other sling post servlet params, they would be processed normally. [SG] Yes they would be *but* it would be deferred till sling receives last chunk. > * Last chunk: Stiches all chunks. Process all upload semantics > and request parameters and creates jcr node structure. The question (also for 2.+3.) is: does Sling have to know when everything was uploaded? - if it's temporarily stored on the file system, then yes (to move it to jcr) - if it's stored in a special content structure in jcr, then yes (to convert it to a nt:file) - if it's stored in nt:file in jcr, then no [SG] you break modify binary use case if you append in place. The binary would be corrupted unless all chunks have arrived. The last two would probably require some investigation in data store enhancements to avoid wasting space. [SG] afaik, additive "CUD" is very fundamental to tarpersistence and datastore so we have to live with it at least till oak. Cheers, Alex
