Answers-- >The problem of chunks is that they are >- not self-describing (what size?) When you parse multipart request, you should get chunk size. In single part request, content-length gives chunk size. >- must all have the same length Not mandatory. Why you think so? In merge you append chunks in serial manner till you find last chunk. > - introduce an arbitrary numbering scheme that you cannot break out of Explain it more. >- if problems arise for one chunk, the actual order might become very >different, so "last chunk" is not a fixed thing I don't see it as an issue. "last chunk" not required to be fixed thing as client explicitly mark last chunk request. >- as noted above, not in line with existing HTTP concepts for this (even if >they currently only apply to GETs) Afaics, Aws s3 uses chunk number approaches and we believe AWS is doing good in terms of scalability and concurrency. One of the primary reason that we thrust on chunknumber approach.
> Do you mean the file would be deferred or the other parameters? I'd say only > the file, because you probably only sent the other ones with the first file > snippet (and the client is free to chose when to send them along) and > secondly making all params defer is going to be very complex. Sling cannot create nt:file without jcr:data because it will throw node type constraint violations exception. So node creation and other parameters processing has to be done in last request. Client require to parameters in last request but free to send/not send in first and intermediate request. Sling ignores other parameters send in first and intermediate requests. > - have a configurable size limit for those partial files (e.g. 50 GB) >- once the limit is hit, clean up expired files; if not enough, clean up the >oldest files Imo, size check upon each chunk upload request is not required. It adds complexity. We already have scheduled job which can be configured to do necessary clean up. > Then a protocol questions arises: What to respond when a client uploads the > "final" byte range, but the previously uploaded ones are no longer present on > the server? On first chunk upload, sling sends "location" header in response. Subsequent upload request use this header as uploadId. Sling sends 404 when no resumable upload found. -----Original Message----- From: Alexander Klimetschek [mailto:[email protected]] Sent: 26 February 2013 21:35 To: [email protected] Subject: Re: [POST] Servlet resolution for non existing resource Beware ;-) XXL mail with multiple proposals, the more interesting ones coming later... On 25.02.2013, at 14:48, Shashank Gupta <[email protected]> wrote: > This would make it simpler to switch to a Range-header based upload if it > might be standardized around HTTP in the future. > > [SG] Range based upload is not a standard and declined by Julian. Yes :-) What I mean is if it's going to be standardized in a new HTTP version or extension, it will very very likely be byte-range based as well. > Introduction of new type "sling:partialFile" complicate things. How does it > solve "modify binary" use case. I will not take this approach unless there > is consensus for this approach. Avoiding nt:file?: Discussed that with Felix and he pointed out that we'd need to avoid having something that looks like an nt:file (either is or extends from it), to avoid that jcr events are thrown and generic application code tries to read it before the file is finished. Any specific marker we introduce (properties or node type such as sling:PartialFile < nt:file) would need to be handled in all the existing code, which is not feasible. So if we go this route, sling:PartialFile must not extend from nt:file. Clean up of temp files: But since we need to work around the data store issue (especially since this feature is targeted at large files), it's probably better to start with storing the partial chunks in the file system. The difficult part here is the clean up, mostly because the expiry time for those files needs to be quite long: imagine a user starting to upload a big file, then going home, upload fails over night/weekend, and during the next day he says "resume upload"... this will give typical expiry time of at least one day (ignoring automatic resumes here). Felix and I discussed this: - store partial files, including metadata: jcr path + total length - once full file range is covered, create nt:file in jcr (and clean up partial files) - have a configurable size limit for those partial files (e.g. 50 GB) - have a configurable expiry time (e.g. 2 days) - once the limit is hit, clean up expired files; if not enough, clean up the oldest files - run cleanup periodically as well (expiry time or 1/2 expiry time or ...) Then a protocol questions arises: What to respond when a client uploads the "final" byte range, but the previously uploaded ones are no longer present on the server? Do we need an additional acknowledge if the file was successfully uploaded (HEAD request on the file returning 200?)? > What if the second chunk failed and you want to repeat it? While all the > others including "lastChunk" where successfull. I think chunks where the > server doesn't know the real byte coordinates for every single request won't > work. You need to specify exactly what is uploaded - and byte ranges + full > length are IMHO more generic than to say chunk X out of N, size of chunk = M. > > [SG] I think we are not discussing about parallel chunk upload here and thus > invalidates your point. The spec and impl is for simple, resumeable and > serial chunk upload. > Query chunk upload provides chunk number and bytes uploaded till failure > point. Client will resume from next chunknumber and byte offset. The problem of chunks is that they are - not self-describing (what size?) - must all have the same length - introduce an arbitrary numbering scheme that you cannot break out of - if problems arise for one chunk, the actual order might become very different, so "last chunk" is not a fixed thing - as noted above, not in line with existing HTTP concepts for this (even if they currently only apply to GETs) Hence my -1 on indexed chunks. > [SG] too complex. We have to live with current datastore implementation at > least for the time being. Data store & Oak: I had a chat with Thomas Müller (works on the Jackrabbit & Oak team). He said that a) the data store in oak will be improved and share binaries on smaller 2MB patches (instead of entire files) b) for the existing JR2 FileDataStore, we should not care about the additional space overhead, the garbage collector will take care of it (just a matter of enough disk space and gc configuration) This means we could put the structure into the repository right away (e.g. /tmp) and then combine the chunks into the final file. This would happen via some kind of SequenceInputStream that combines multiple input streams in a fixed sequence into one stream (this actually exists already). Doing so now would mean we would basically duplicate the binary in the data store (all chunks in /tmp + final file), that shouldn't be an issue. Later, Oak with its updated data store could optimize here: we replace the input stream with a SequenceBinaryInputStream that gives a single input stream for the input streams of multiple binaries. It would hold the list of binaries and it would be part of the jackrabbit/oak API. The Oak implementation could detect that and instead of reading the input stream (and thus copying everything, taking time), resolve the binaries and use their internal representation to aggregate them into the new one. This way the Sling solution is more or less the same (only importing a different API class later), and the underlying persistence layer improves by itself. When putting things into /tmp and with Jackrabbit 2 we'd need a similar cleanup mechanism as with the file system, but at least it would count as normal repository content. With Oak, this would even be less of an issue since the partial binaries and the final ones would share their data store snippets - so it's only a matter of cleaning up /tmp in JCR for the sake of structure cleanup, not space savings. Streaming end-to-end: Finally, there would actually be a nice use case for updating a nt:file in place (no /tmp): streaming formats such as video or audio. Imagine an encoder streams a real time video feed into JCR using the partial upload feature - and different clients on the other side streaming that video from the JCR, using the existing HTTP GET Range support in Sling (which is used e.g. by most modern video streaming solutions). In this case the file could really be nt:file like, if the file is unreadable because it is not complete yet, and then try again on the next modification and basically succeed with the final one. Such apps have a marker to say it's not finished and applications are somewhat forced to know it. And for events: if they get one on every modification, they would simply handle it, fail could easily be changed to handle the flag to fail fast - as those apps are also most likely the ones that ask for the resumable upload in the first place. > This would apply to files in the form request only, so that is already > handled specifically in the sling post servlet (i.e. updating a binary > property). If the request contains other sling post servlet params, they > would be processed normally. > [SG] Yes they would be *but* it would be deferred till sling receives last > chunk. Do you mean the file would be deferred or the other parameters? I'd say only the file, because you probably only sent the other ones with the first file snippet (and the client is free to chose when to send them along) and secondly making all params defer is going to be very complex. > The question (also for 2.+3.) is: does Sling have to know when everything was > uploaded? > - if it's temporarily stored on the file system, then yes (to move it to jcr) > - if it's stored in a special content structure in jcr, then yes (to convert > it to a nt:file) > - if it's stored in nt:file in jcr, then no > > [SG] you break modify binary use case if you append in place. The binary > would be corrupted unless all chunks have arrived. I guess you refer to the last point (nt:file): of course you'd update the files properly (get content, update the appropriate byte range) - not just naively append... > The last two would probably require some investigation in data store > enhancements to avoid wasting space. > [SG] afaik, additive "CUD" is very fundamental to tarpersistence and > datastore so we have to live with it at least till oak. See above. Cheers, Alex
