Hi, On 1 August 2016 at 14:51, Clint Goudie-Nice <[email protected]> wrote:
> I agree, it would be valuable to stream it. > > AFAIK, the problem is that during a form submission you don’t know the > order of the form attributes, and the binary data can be mixed into the > middle. Maybe these could be ordered so the file submission came last; > making streaming of the file data without a temporary location possible. > IIRC, having implemented a custom servlet to do streaming, the order of the form attributes is the same as the order of the elements in the DOM. What may be harder is getting generic code to access the request parameters in the correct order. Best Regards Ian > > Clint > > On 8/1/16, 3:57 AM, "[email protected] on behalf of Ian Boston" < > [email protected] on behalf of [email protected]> wrote: > > Hi, > Clint's question, assuming my response was correct, raises a second > question. > > Should Sling support binary upload streaming without using intermediate > disk buffers? > > ie > request -> Sling -> target persistence > response <- Sling > > Not > Client -> Sling -> Local Disk Buffer. > Local Disk Buffer -> target persistence > response <- Sling > > and Not, in the case of the Oak S3 DS > Client -> Sling -> Disk Buffer. > Disk Buffer -> Oak S3 Async Disk Buffer > response <- Sling > Oak S3 Async Disk > Buffer > -> S3 > > > I dont know if Streaming is possible in Sling via the SlingMainServlet > given the way in which the request is wrapped, but Commons Upload does > have > a streaming API so the request Input stream or multipart part, can be > streamed directly to a Resource.adaptTo(InputStream.class), provided > Sling > would allow it. > > Streaming does require that sufficient information to perform final > storage > precedes the stream in the request. (Auth headers, resource > identification, > target resource name etc) > > IIRC, the alternative for users at present is to write a custom > servlet and > mount it as an OSGi servlet. > > Best Regards > Ian > > > On 29 July 2016 at 18:56, Ian Boston <[email protected]> wrote: > > > Hi, > > > > BTW: There is IIRC a 32 bit problem (2GB files) with http via some > proxies > > that can be avoided by using Chunked transfer encoding as each > transfer > > doesn't need to be large, hence a PUT with Chunked encoding will > stream. > > > > More inline. > > > > > > > > > > On 29 July 2016 at 18:16, Clint Goudie-Nice <[email protected]> > wrote: > > > >> Hello all, > >> > >> Do binary uploads (assets, etc) get written to a temp location > before > >> being put into the repository, or are they streamed end-to-end for > these 4 > >> transfer types: > >> > >> > >> 1) Mime Multipart uploads / form uploads > >> > > > > Multipart uploads > 256000 bytes are written to disk, using commons > file > > upload ServletFileUpload[1] in [2] which produced FileItems which > are then > > read. I think the InputStream from the FileInput is connected to the > > OutputStream of a jcd:data and the data pumped between the 2 in > blocks. > > > > I cant find any evindence of Sling using the FIleUpload streaming > API for > > multi part posts [3] > > > > > >> > >> 2) Content-Transfer:Chunked uploads > >> > > > > > > This is a lower level transfer encoding handled by jetty, chunked > encoding > > does not surface in the Servlet API (IIRC). When streaming it does > allow > > response output and upload output to stream without knowing the > content > > length, so Jetty uses it producing 1 chunk on every flush. I would > expect a > > modern browser to use chunked encoding for uploads. > > > > > >> 3) Plain binary uploads with a specified length header > >> > > > > PUT operations are handled by the Jackrabbit WebDav bundle. I am not > > familiar with the code but do remember sending large volumes of data > > through it in 2007 and not seeing heap or local file IO. [4] backs > that up. > > I think > > > > > >> > >> 4) Plain binary uploads with no specified length header > >> > > > > If the content length is missing and it;s not chunked encoding jetty > will > > read until the socket is closed. There is no difference from a > server point > > of view in how the request is handled. > > > > > > > > > >> > >> There are pros and cons to each approach. Obviously, if you stream > it end > >> to end, if the client is uploading a large stream of data, you have > to > >> maintain a session over a long period, possibly hours. > >> > > > > I assume you mean JCR session not http session. > > The request will be authenticated before streaming starts, so the > session > > will be validated at the start of the request and close when the > session is > > logged out, ie at the end of the request. (IIRC). > > > > > >> > >> If it is being streamed to a temporary location first, and then to > the > >> repository, you require an additional write and an additional read > of IO, > >> but potentially less session time. > >> > > > > The session time is the same regardless, but the time taken to > upload will > > require more IO so the operation will take longer and there is no > > interleaving between request and stream to the underlying DS. If the > > networks are the same speed, then upload takes 2x the time. Since the > > session is created before the upload starts and before commons file > upload > > processes the request the session is open for the enture request. > > > > There is no load on the underlying repository from a file upload, > other > > than the metadata which is minimal. I mean in the sense that there > wont be > > 1000s of Oak Documents being created during the upload, only a > pointer to > > the DataStore and a handfull of nodes. Since thats a small commit it > wont > > generate a branch. > > > > Obviously if you are using a MongoDB DS it will generate lots of > blobs > > which will impact replication and other things. > > A S3 DS will not start sending the data until a second copy of the > data is > > made into the S3 Async upload cache (assuming that's enabled) > otherwise I > > think it will stream directly to the S3 API. > > FS DS is , well, FS. > > > > > >> > >> I would like to better understand the requirements on the system > imposed > >> by these different upload types. > >> > >> Clint > >> > > > > HTH > > Best Regards > > Ian > > > > 1 https://commons.apache.org/proper/commons-fileupload/using.html > > > > 2 > org.apache.sling.engine.impl.parameters.ParameterSupport#parseMultiPartPost > > 3 > https://commons.apache.org/proper/commons-fileupload/streaming.html > > 4 > > > https://github.com/apache/jackrabbit/blob/b252e505e34b03638207e959aaafce7c480ebaaa/jackrabbit-webdav/src/main/java/org/apache/jackrabbit/webdav/server/AbstractWebdavServlet.java#L629 > > > > >
