[
https://issues.apache.org/jira/browse/SLING-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ian Boston resolved SLING-5948.
-------------------------------
Resolution: Fixed
Fix Version/s: Engine 2.5.2
Servlets Post 2.3.14
Fixed in sequence ending https://svn.apache.org/repos/asf/sling/trunk@1756172
> Support Streaming uploads.
> --------------------------
>
> Key: SLING-5948
> URL: https://issues.apache.org/jira/browse/SLING-5948
> Project: Sling
> Issue Type: Bug
> Components: Engine, Servlets
> Affects Versions: Servlets Post 2.3.12, Engine 2.5.0
> Reporter: Ian Boston
> Assignee: Ian Boston
> Fix For: Servlets Post 2.3.14, Engine 2.5.2
>
> Attachments: SLING-5948-Proposal1-illustration.patch,
> SLING-5948-Proposal2v2.patch, SLING-5948-Proposal2v3.patch,
> TarMKDSNotStreamed.png, TarMKDSStreamed.png, TarMKStreamed.png
>
>
> Currently multipart POST request made to sling use the commons file upload
> component that parses the request fully before processing. If uploads are
> small they are stored in byte[], over a configurable limit they are sent to
> disk. This creates additional IO overhead, increases heap usage and increases
> upload time.
> Having searched the SLing jira, and sling-dev I have failed to find an issue
> relating to this area, although it has been discussed in the past.
> I have 2 proposals.
> The SlingMain Servlet processes all requests, identifying the request type
> and parsing the request body. If the body is multipart the Commons File
> Upload library is used to process the request body in full when the
> SlingServletRequest is created or the first parameter is requested. To enable
> streaming of a request this behaviour needs to be modified. Unfortunately,
> processing a streamed request requires that the ultimate processor requests
> multipar parts in a the correct order to avoid non streaming, so a streaming
> behaviour will not be suitable for most POST requests and can only be used if
> the ultimate Servlet has been written to process a stream rather than a map
> of parameters.
> Both proposals need to identify requests that should be processed as a
> stream. This identification must happen in the headers or URI as any
> identification later than the headers may be too late. Something like a
> custom header (x-uploadmode: stream) or a query string (?uploadmode=stream)
> or possibly a selector (/path/to/target.stream) would work and each have
> advantages and disadvantages.
> h1. Proposal 1
> When a POST request is identified as multipart and streaming, create a
> LazyParameterMap that uses the Commons File Upload Streaming API
> (https://commons.apache.org/proper/commons-fileupload/streaming.html) to
> process the request on demand as parameters are requested. If parameters are
> requested out of sequence, do something sensible attempting to maintain
> streaming behaviour, but if the code really breaks streaming, throw an
> exception to alert servlet developer early.
> h2. Pros
> * Follows a similar pattern to currently using the Servlet API.
> h2. Cons
> * [] params will be hard to support when the [] is out of order, and almost
> impossible if the [] is an upload body.
> * May not work when a request is routed incorrectly as getParameter requests
> will be out of streaming sequence.
> h2. Proposal 2
> When a POST request is identified as multipart and streaming, create a
> NullParameterMap that returns null for all parameter get operations. In
> addition set a request Attribute containing a Iterator<Part> that allows
> access to the request stream in a similar way to the Commons File Upload
> Streaming API. Servlets that process uploads streams will use the
> Iterator<Part> object retrieved from the request. Part is the Servlet 3 Part
> https://tomcat.apache.org/tomcat-7.0-doc/servletapi/javax/servlet/http/Part.html.
> IIUC This API is already used in the Sling Engine and exported by a bundle.
> h2. Pros
> * Won't get broken by existing getParameter calls, which all return null and
> do no harm to the stream.
> * Far simpler implementation as the Servlet implementation has to get the
> request data in streaming order.
> h2. Cons
> * Needs a custom Sling Upload Operation that understand how to process the
> Iterator<Part>
> * Can't use the adaptTo mechanism on the request, as
> request.adaptTo(Iterator.class) doesn't make sense being too generic. Would
> need a new API to make this work. request.adaptTo(PartsIterator.class), which
> PartsIterator extends Iterator.
> * Supporting the full breadth of the Sling Operation protocol in the Sling
> Upload Operation will require wide scale duplication of code from the
> ModifyOperation implementation as the ModifyOperation expects RequestProperty
> maps and wont work with a streamed part.
> * Forces the Sling Post bundle to depend on Servlet 3 to get the Part API,
> requiring some patches to the existing test classes.
> To support both methods a standard Servlet to handle streamed uploads would
> be needed, connecting the file request stream to the Resource output stream.
> In some cases (Oak S3 DS Async Uploads, Mongo DS) this wont entirely
> eliminate local disk IO, although in most cases the Resource output stream
> wrapps the final output stream. To maintain streaming a save operation may
> need to be performed for each upload to cause the request stream to be read.
> If this is a duplicate issue, please link.
> If you have input, please share.
> Have some patches in progress, would prefer Proposal 2, as Proposal 1 looks
> messy at the moment.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)