[ 
https://issues.apache.org/jira/browse/SLING-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Boston updated SLING-5948:
------------------------------
    Attachment: SLING-5948-Proposal2v3.patch

Added unit test coverage and tested successfully in in Sling launchpad.

Apply the patch, 

{code}
cd bundles/engine
mvn clean install
cd ../servlets/post
mvn clean install
cd ../../launchpad/builder
mvn clean install
java -jar target/org.apache.sling.launchpad-9-SNAPSHOT.jar &
tail -f sling/logs/error.log
{code}

Then Test normal upload 
{code}
curl -v -F file=@P1060839.jpg http://admin:admin@localhost:8080/content/test
{code}

Then test streamed upload
{code}
curl -v -F P1060839.jpg=@P1060839.jpg 
http://admin:admin@localhost:8080/content/test?uploadmode=stream
{code}

or 
{code}
curl -v -H "X-uploadmode: stream" -F P1060839.jpg=@P1060839.jpg 
http://admin:admin@localhost:8080/content/test
{code}



> Support Streaming uploads.
> --------------------------
>
>                 Key: SLING-5948
>                 URL: https://issues.apache.org/jira/browse/SLING-5948
>             Project: Sling
>          Issue Type: Bug
>          Components: Engine, Servlets
>    Affects Versions: Servlets Post 2.3.12, Engine 2.5.0
>            Reporter: Ian Boston
>            Assignee: Ian Boston
>         Attachments: SLING-5948-Proposal1-illustration.patch, 
> SLING-5948-Proposal2v2.patch, SLING-5948-Proposal2v3.patch
>
>
> Currently multipart POST request made to sling use the commons file upload 
> component that parses the request fully before processing. If uploads are 
> small they are stored in byte[], over a configurable limit they are sent to 
> disk. This creates additional IO overhead, increases heap usage and increases 
> upload time.
> Having searched the SLing jira, and sling-dev I have failed to find an issue 
> relating to this area, although it has been discussed in the past.
> I have 2 proposals.
> The SlingMain Servlet processes all requests, identifying the request type 
> and parsing the request body. If the body is multipart the Commons File 
> Upload library is used to process the request body in full when the 
> SlingServletRequest is created or the first parameter is requested. To enable 
> streaming of a request this behaviour needs to be modified. Unfortunately, 
> processing a streamed request requires that the ultimate processor requests 
> multipar parts in a the correct order to avoid non streaming, so a streaming 
> behaviour will not be suitable for most POST requests and can only be used if 
> the ultimate Servlet has been written to process a stream rather than a map 
> of parameters.
> Both proposals need to identify requests that should be processed as a 
> stream. This identification must happen in the headers or URI as any 
> identification later than the headers may be too late. Something like a 
> custom header (x-uploadmode: stream) or a query string (?uploadmode=stream) 
> or possibly a selector (/path/to/target.stream) would work and each have 
> advantages and disadvantages.
> h1. Proposal 1
> When a POST request is identified as multipart and streaming, create a 
> LazyParameterMap that uses the Commons File Upload Streaming API 
> (https://commons.apache.org/proper/commons-fileupload/streaming.html) to 
> process the request on demand as parameters are requested. If parameters are 
> requested out of sequence, do something sensible attempting to maintain 
> streaming behaviour, but if the code really breaks streaming, throw an 
> exception to alert servlet developer early.
> h2. Pros
> * Follows a similar pattern to currently using the Servlet API.
> h2. Cons
> * [] params will be hard to support when the [] is out of order, and almost 
> impossible if the [] is an upload body.
> * May not work when a request is routed incorrectly as getParameter requests 
> will be out of streaming sequence.
> h2. Proposal 2
> When a POST request is identified as multipart and streaming, create a 
> NullParameterMap that returns null for all parameter get operations. In 
> addition set a request Attribute containing a Iterator<Part> that allows 
> access to the request stream in a similar way to the Commons File Upload 
> Streaming API.  Servlets that process uploads streams will use the 
> Iterator<Part> object retrieved from the request. Part is the Servlet 3 Part 
> https://tomcat.apache.org/tomcat-7.0-doc/servletapi/javax/servlet/http/Part.html.
>  IIUC This API is already used in the Sling Engine and exported by a bundle.
> h2. Pros
> * Won't get broken by existing getParameter calls, which all return null and 
> do no harm to the stream.
> * Far simpler implementation as the Servlet implementation has to get the 
> request data in streaming order.
> h2. Cons
> * Needs a custom Sling Upload Operation that understand how to process the 
> Iterator<Part>
> * Can't use the adaptTo mechanism on the request, as 
> request.adaptTo(Iterator.class) doesn't make sense being too generic. Would 
> need a new API to make this work. request.adaptTo(PartsIterator.class), which 
> PartsIterator extends Iterator.
> * Supporting the full breadth of the Sling Operation protocol in the Sling 
> Upload Operation will require wide scale duplication of code from the 
> ModifyOperation implementation as the ModifyOperation expects RequestProperty 
> maps and wont work with a streamed part.
> * Forces the Sling Post bundle to depend on Servlet 3 to get the Part API, 
> requiring some patches to the existing test classes.
> To support both methods a standard Servlet to handle streamed uploads would 
> be needed, connecting the file request stream to the Resource output stream. 
> In some cases (Oak S3 DS Async Uploads, Mongo DS) this wont entirely 
> eliminate local disk IO, although in most cases the Resource output stream 
> wrapps the final output stream. To maintain streaming a save operation may 
> need to be performed for each upload to cause the request stream to be read.
> If this is a duplicate issue, please link.
> If you have input, please share.
> Have some patches in progress, would prefer Proposal 2, as Proposal 1 looks 
> messy at the moment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to