Hi, BTW: There is IIRC a 32 bit problem (2GB files) with http via some proxies that can be avoided by using Chunked transfer encoding as each transfer doesn't need to be large, hence a PUT with Chunked encoding will stream.
More inline. On 29 July 2016 at 18:16, Clint Goudie-Nice <[email protected]> wrote: > Hello all, > > Do binary uploads (assets, etc) get written to a temp location before > being put into the repository, or are they streamed end-to-end for these 4 > transfer types: > > > 1) Mime Multipart uploads / form uploads > Multipart uploads > 256000 bytes are written to disk, using commons file upload ServletFileUpload[1] in [2] which produced FileItems which are then read. I think the InputStream from the FileInput is connected to the OutputStream of a jcd:data and the data pumped between the 2 in blocks. I cant find any evindence of Sling using the FIleUpload streaming API for multi part posts [3] > > 2) Content-Transfer:Chunked uploads > This is a lower level transfer encoding handled by jetty, chunked encoding does not surface in the Servlet API (IIRC). When streaming it does allow response output and upload output to stream without knowing the content length, so Jetty uses it producing 1 chunk on every flush. I would expect a modern browser to use chunked encoding for uploads. > 3) Plain binary uploads with a specified length header > PUT operations are handled by the Jackrabbit WebDav bundle. I am not familiar with the code but do remember sending large volumes of data through it in 2007 and not seeing heap or local file IO. [4] backs that up. I think > > 4) Plain binary uploads with no specified length header > If the content length is missing and it;s not chunked encoding jetty will read until the socket is closed. There is no difference from a server point of view in how the request is handled. > > There are pros and cons to each approach. Obviously, if you stream it end > to end, if the client is uploading a large stream of data, you have to > maintain a session over a long period, possibly hours. > I assume you mean JCR session not http session. The request will be authenticated before streaming starts, so the session will be validated at the start of the request and close when the session is logged out, ie at the end of the request. (IIRC). > > If it is being streamed to a temporary location first, and then to the > repository, you require an additional write and an additional read of IO, > but potentially less session time. > The session time is the same regardless, but the time taken to upload will require more IO so the operation will take longer and there is no interleaving between request and stream to the underlying DS. If the networks are the same speed, then upload takes 2x the time. Since the session is created before the upload starts and before commons file upload processes the request the session is open for the enture request. There is no load on the underlying repository from a file upload, other than the metadata which is minimal. I mean in the sense that there wont be 1000s of Oak Documents being created during the upload, only a pointer to the DataStore and a handfull of nodes. Since thats a small commit it wont generate a branch. Obviously if you are using a MongoDB DS it will generate lots of blobs which will impact replication and other things. A S3 DS will not start sending the data until a second copy of the data is made into the S3 Async upload cache (assuming that's enabled) otherwise I think it will stream directly to the S3 API. FS DS is , well, FS. > > I would like to better understand the requirements on the system imposed > by these different upload types. > > Clint > HTH Best Regards Ian 1 https://commons.apache.org/proper/commons-fileupload/using.html 2 org.apache.sling.engine.impl.parameters.ParameterSupport#parseMultiPartPost 3 https://commons.apache.org/proper/commons-fileupload/streaming.html 4 https://github.com/apache/jackrabbit/blob/b252e505e34b03638207e959aaafce7c480ebaaa/jackrabbit-webdav/src/main/java/org/apache/jackrabbit/webdav/server/AbstractWebdavServlet.java#L629
