On 25.02.2013, at 10:38, Shashank Gupta <[email protected]> wrote: > Here are the salient points wrt resumable upload implementation/integration > in SlingPostServlet[1]. > > 1. Resumeable upload is supported in "modify"operation ( ie. In default > operation) . No new operation introduced for it.
Yes. > 2. Request parameter ":chunkNumber" distinguish between partial and > 'single shot upload'. Better than "content range" parameter approach, as it > avoids ambiguity in overlapping ranges like 100-199, 100-299 etc. What's wrong with overlapping ranges? And client errors are to be expected, for example you will always have the possible case that a client never uploads chunk X or range A-B. And either some garbage collection kicks in (if stored in the file system, but adds complexity) or it's just left in the repository (i.e. with a "sling:partialFile" node type or simply a normal nt:file that is constantly updated). The newest upload wins, i.e. existing ranges would always be overwritten. This would make it simpler to switch to a Range-header based upload if it might be standardized around HTTP in the future. > 3. Request parameter ":lastChunk=true" distinguish between intermediate > and last upload chunk. What if the second chunk failed and you want to repeat it? While all the others including "lastChunk" where successfull. I think chunks where the server doesn't know the real byte coordinates for every single request won't work. You need to specify exactly what is uploaded - and byte ranges + full length are IMHO more generic than to say chunk X out of N, size of chunk = M. > 4. Chunk Storage: > * In place append: will not work in modify use case. In create, > If chunk size is x, Square(x) space consumed in datastore. [x +2x + 3x+ ... > = O(square(x)) > * Chunks saved in JCR. If chunk size is x, 2x space consumed in > datastore. Chunks stored at temp location in > /var/chunks/<uploadid>/<chunkNumber> Right, good point. But that's really an issue of the data store, but not of the sling/jcr API. Ideally for this case the datastore would allow update of a binary (initially @FileLength filled with zeros and first range), while adapting the hash along the way (but moving the actual file). It would require real-time reference tracking in the data store though... > 5. Chunk upload response: > * First/Intermediate chunks: 200 OK. Response body will *not* > contain list of changes . Location header contain temp location > "/var/chunks/<uploadid>". Client can use it to retrieve upload information > and hole information. > * Last chunk: 201/200 in case of creation/modification. Response > body contains list of changes in json or html format. Response body would be the normal sling post servlet response (html or json IIRC) and would always be 200 if successful. > 6. Chunk upload processing > * First/Intermediate chunks: Only save chunk is saved in jcr. > Ignores all upload semantics (@TypeHint, etc) and request parameter. This would apply to files in the form request only, so that is already handled specifically in the sling post servlet (i.e. updating a binary property). If the request contains other sling post servlet params, they would be processed normally. > * Last chunk: Stiches all chunks. Process all upload semantics > and request parameters and creates jcr node structure. The question (also for 2.+3.) is: does Sling have to know when everything was uploaded? - if it's temporarily stored on the file system, then yes (to move it to jcr) - if it's stored in a special content structure in jcr, then yes (to convert it to a nt:file) - if it's stored in nt:file in jcr, then no The last two would probably require some investigation in data store enhancements to avoid wasting space. Cheers, Alex
