[
https://issues.apache.org/jira/browse/SLING-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ian Boston updated SLING-5948:
------------------------------
Description:
Currently multipart POST request made to sling use the commons file upload
component that parses the request fully before processing. If uploads are small
they are stored in byte[], over a configurable limit they are sent to disk.
This creates additional IO overhead, increases heap usage and increases upload
time.
Having searched the SLing jira, and sling-dev I have failed to find an issue
relating to this area, although it has been discussed in the past.
I have 2 proposals.
The SlingMain Servlet processes all requests, identifying the request type and
parsing the request body. If the body is multipart the Commons File Upload
library is used to process the request body in full when the
SlingServletRequest is created or the first parameter is requested. To enable
streaming of a request this behaviour needs to be modified. Unfortunately,
processing a streamed request requires that the ultimate processor requests
multipar parts in a the correct order to avoid non streaming, so a streaming
behaviour will not be suitable for most POST requests and can only be used if
the ultimate Servlet has been written to process a stream rather than a map of
parameters.
Both proposals need to identify requests that should be processed as a stream.
This identification must happen in the headers or URI as any identification
later than the headers may be too late. Something like a custom header
(x-uploadmode: stream) or a query string (?uploadmode=stream) or possibly a
selector (/path/to/target.stream) would work and each have advantages and
disadvantages.
h1. Proposal 1
When a POST request is identified as multipart and streaming, create a
LazyParameterMap that uses the Commons File Upload Streaming API
(https://commons.apache.org/proper/commons-fileupload/streaming.html) to
process the request on demand as parameters are requested. If parameters are
requested out of sequence, do something sensible attempting to maintain
streaming behaviour, but if the code really breaks streaming, throw an
exception to alert servlet developer early.
h2. Pros
* Follows a similar pattern to currently using the Servlet API.
h2. Cons
* [] params will be hard to support when the [] is out of order, and almost
impossible if the [] is an upload body.
* May not work when a request is routed incorrectly as getParameter requests
will be out of streaming sequence.
h2. Proposal 2
When a POST request is identified as multipart and streaming, create a
NullParameterMap that returns null for all parameter get operations. In
addition set a request Attribute containing a Iterator<Part> that allows access
to the request stream in a similar way to the Commons File Upload Streaming
API. Servlets that process uploads streams will use the Iterator<Part> object
retrieved from the request. Part is the Servlet 3 Part
https://tomcat.apache.org/tomcat-7.0-doc/servletapi/javax/servlet/http/Part.html.
IIUC This API is already used in the Sling Engine and exported by a bundle.
h2. Pros
* Won't get broken by existing getParameter calls, which all return null and do
no harm to the stream.
* Far simpler implementation as the Servlet implementation has to get the
request data in streaming order.
h2. Cons
* Needs custom servlets that understand how to process the Iterator<Part>
* Probably cant use the adaptTo mechanism on the request, as
request.adaptTo(Iterator.class) doesn't make sense being too generic. Would
need a new API to make this work. request.adaptTo(PartsIterator.class), which
PartsIterator extends Iterator.
To support both methods a standard Servlet to handle streamed uploads would be
needed, connecting the file request stream to the Resource output stream. In
some cases (Oak S3 DS Async Uploads, Mongo DS) this wont entirely eliminate
local disk IO, although in most cases the Resource output stream wrapps the
final output stream. To maintain streaming a save operation may need to be
performed for each upload to cause the request stream to be read.
If this is a duplicate issue, please link.
If you have input, please share.
Have some patches in progress, would prefer Proposal 2, as Proposal 1 looks
messy at the moment.
was:
Currently multipart POST request made to sling use the commons file upload
component that parses the request fully before processing. If uploads are small
they are stored in byte[], over a configurable limit they are sent to disk.
This creates additional IO overhead, increases heap usage and increases upload
time.
Having searched the SLing jira, and sling-dev I have failed to find an issue
relating to this area, although it has been discussed in the past.
I have 2 proposals.
The SlingMain Servlet processes all requests, identifying the request type and
parsing the request body. If the body is multipart the Commons File Upload
library is used to process the request body in full when the
SlingServletRequest is created or the first parameter is requested. To enable
streaming of a request this behaviour needs to be modified. Unfortunately,
processing a streamed request requires that the ultimate processor requests
multipar parts in a the correct order to avoid non streaming, so a streaming
behaviour will not be suitable for most POST requests and can only be used if
the ultimate Servlet has been written to process a stream rather than a map of
parameters.
Both proposals need to identify requests that should be processed as a stream.
This identification must happen in the headers or URI as any identification
later than the headers may be too late. Something like a custom header
(x-uploadmode: stream) or a query string (?uploadmode=stream) or possibly a
selector (/path/to/target.stream) would work and each have advantages and
disadvantages.
h1. Proposal 1
When a POST request is identified as multipart and streaming, create a
LazyParameterMap that uses the Commons File Upload Streaming API
(https://commons.apache.org/proper/commons-fileupload/streaming.html) to
process the request on demand as parameters are requested. If parameters are
requested out of sequence, do something sensible attempting to maintain
streaming behaviour, but if the code really breaks streaming, throw an
exception to alert servlet developer early.
h2. Pros
* Follows a similar pattern to currently using the Servlet API.
h2. Cons
* [] params will be hard to support when the [] is out of order, and almost
impossible if the [] is an upload body.
* May not work when a request is routed incorrectly as getParameter requests
will be out of streaming sequence.
h2. Proposal 2
When a POST request is identified as multipart and streaming, create a
NullParameterMap that returns null for all parameter get operations. In
addition set a request Attribute containing a RequestStream API object that
allows access to the request stream in a similar way to the Commons File Upload
Streaming API. Servlets that process uploads streams will use the
RequestStream API object retrieved from the request.
h2. Pros
* Won't get broken by existing getParameter calls, which all return null and do
no harm to the stream.
* Far simpler implementation as the Servlet implementation has to get the
request data in streaming order.
h2. Cons
* Requires new API Objects.
To support both methods a standard Servlet to handle streamed uploads would be
needed, connecting the file request stream to the Resource output stream. In
some cases (Oak S3 DS Async Uploads, Mongo DS) this wont entirely eliminate
local disk IO, although in most cases the Resource output stream wrapps the
final output stream. To maintain streaming a save operation may need to be
performed for each upload to cause the request stream to be read.
If this is a duplicate issue, please link.
If you have input, please share.
Have some patches in progress, would prefer Proposal 2, as Proposal 1 looks
messy at the moment.
> Support Streaming uploads.
> --------------------------
>
> Key: SLING-5948
> URL: https://issues.apache.org/jira/browse/SLING-5948
> Project: Sling
> Issue Type: Bug
> Components: Engine
> Affects Versions: Engine 2.5.0
> Reporter: Ian Boston
> Assignee: Ian Boston
>
> Currently multipart POST request made to sling use the commons file upload
> component that parses the request fully before processing. If uploads are
> small they are stored in byte[], over a configurable limit they are sent to
> disk. This creates additional IO overhead, increases heap usage and increases
> upload time.
> Having searched the SLing jira, and sling-dev I have failed to find an issue
> relating to this area, although it has been discussed in the past.
> I have 2 proposals.
> The SlingMain Servlet processes all requests, identifying the request type
> and parsing the request body. If the body is multipart the Commons File
> Upload library is used to process the request body in full when the
> SlingServletRequest is created or the first parameter is requested. To enable
> streaming of a request this behaviour needs to be modified. Unfortunately,
> processing a streamed request requires that the ultimate processor requests
> multipar parts in a the correct order to avoid non streaming, so a streaming
> behaviour will not be suitable for most POST requests and can only be used if
> the ultimate Servlet has been written to process a stream rather than a map
> of parameters.
> Both proposals need to identify requests that should be processed as a
> stream. This identification must happen in the headers or URI as any
> identification later than the headers may be too late. Something like a
> custom header (x-uploadmode: stream) or a query string (?uploadmode=stream)
> or possibly a selector (/path/to/target.stream) would work and each have
> advantages and disadvantages.
> h1. Proposal 1
> When a POST request is identified as multipart and streaming, create a
> LazyParameterMap that uses the Commons File Upload Streaming API
> (https://commons.apache.org/proper/commons-fileupload/streaming.html) to
> process the request on demand as parameters are requested. If parameters are
> requested out of sequence, do something sensible attempting to maintain
> streaming behaviour, but if the code really breaks streaming, throw an
> exception to alert servlet developer early.
> h2. Pros
> * Follows a similar pattern to currently using the Servlet API.
> h2. Cons
> * [] params will be hard to support when the [] is out of order, and almost
> impossible if the [] is an upload body.
> * May not work when a request is routed incorrectly as getParameter requests
> will be out of streaming sequence.
> h2. Proposal 2
> When a POST request is identified as multipart and streaming, create a
> NullParameterMap that returns null for all parameter get operations. In
> addition set a request Attribute containing a Iterator<Part> that allows
> access to the request stream in a similar way to the Commons File Upload
> Streaming API. Servlets that process uploads streams will use the
> Iterator<Part> object retrieved from the request. Part is the Servlet 3 Part
> https://tomcat.apache.org/tomcat-7.0-doc/servletapi/javax/servlet/http/Part.html.
> IIUC This API is already used in the Sling Engine and exported by a bundle.
> h2. Pros
> * Won't get broken by existing getParameter calls, which all return null and
> do no harm to the stream.
> * Far simpler implementation as the Servlet implementation has to get the
> request data in streaming order.
> h2. Cons
> * Needs custom servlets that understand how to process the Iterator<Part>
> * Probably cant use the adaptTo mechanism on the request, as
> request.adaptTo(Iterator.class) doesn't make sense being too generic. Would
> need a new API to make this work. request.adaptTo(PartsIterator.class), which
> PartsIterator extends Iterator.
> To support both methods a standard Servlet to handle streamed uploads would
> be needed, connecting the file request stream to the Resource output stream.
> In some cases (Oak S3 DS Async Uploads, Mongo DS) this wont entirely
> eliminate local disk IO, although in most cases the Resource output stream
> wrapps the final output stream. To maintain streaming a save operation may
> need to be performed for each upload to cause the request stream to be read.
> If this is a duplicate issue, please link.
> If you have input, please share.
> Have some patches in progress, would prefer Proposal 2, as Proposal 1 looks
> messy at the moment.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)