[
https://issues.apache.org/jira/browse/SLING-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15448766#comment-15448766
]
Satya Deep Maheshwari commented on SLING-5948:
----------------------------------------------
[~ianeboston], while trying this out, I am seeing that the form fields that I
am sending are getting detected as file which is to be streamed. For e.g. ,
here's the curl command that I am using:
{code}
curl -o /dev/null -v -F key1=value1 -F [email protected] -w
%{time_connect}:%{time_starttransfer}:%{time_total}
http://admin:admin@localhost:8080/temp/file4?uploadmode=stream
{code}
While trying to debug this, I see that on the server, at (1), the part
containing {{key1=value1}} gets detected as a file. This seems to be because of
the way {{RequestPartsIterator.getSubmittedFileName}} is implemented at (2),
Should it be implemented such that it specifically searches for the
{{filename}} header for returning the submitted filename?The multipart rfc at
(3) indicates that {{filename}} is a legitimate header for the part containing
the file in a multipart request. Roughly, can we implement it as below?
{code}
//Taken from https://java.net/jira/browse/SERVLET_SPEC-57
public String getSubmittedFileName() {
String header = this.getHeader("content-disposition");
if(header == null)
return null;
for(String headerPart : header.split(";"))
{
if(headerPart.trim().startsWith("filename"))
{
return headerPart.substring(headerPart.indexOf('=') +
1).trim()
.replace("\"", "");
}
}
return null;
}
{code}
cc [[email protected]]
(1) -
https://github.com/apache/sling/blob/trunk/bundles/servlets/post/src/main/java/org/apache/sling/servlets/post/impl/operations/StreamedUploadOperation.java#L95
(2) -
https://github.com/apache/sling/blob/trunk/bundles/engine/src/main/java/org/apache/sling/engine/impl/parameters/RequestPartsIterator.java#L148
(3) - https://www.ietf.org/rfc/rfc2388.txt
> Support Streaming uploads.
> --------------------------
>
> Key: SLING-5948
> URL: https://issues.apache.org/jira/browse/SLING-5948
> Project: Sling
> Issue Type: Bug
> Components: Engine, Servlets
> Affects Versions: Servlets Post 2.3.12, Engine 2.5.0
> Reporter: Ian Boston
> Assignee: Ian Boston
> Labels: Sling-9-ReleaseNotes
> Fix For: Servlets Post 2.3.14, Engine 2.6.0
>
> Attachments: SLING-5948-Proposal1-illustration.patch,
> SLING-5948-Proposal2v2.patch, SLING-5948-Proposal2v3.patch,
> TarMKDSNotStreamed.png, TarMKDSStreamed.png, TarMKStreamed.png
>
>
> Currently multipart POST request made to sling use the commons file upload
> component that parses the request fully before processing. If uploads are
> small they are stored in byte[], over a configurable limit they are sent to
> disk. This creates additional IO overhead, increases heap usage and increases
> upload time.
> Having searched the SLing jira, and sling-dev I have failed to find an issue
> relating to this area, although it has been discussed in the past.
> I have 2 proposals.
> The SlingMain Servlet processes all requests, identifying the request type
> and parsing the request body. If the body is multipart the Commons File
> Upload library is used to process the request body in full when the
> SlingServletRequest is created or the first parameter is requested. To enable
> streaming of a request this behaviour needs to be modified. Unfortunately,
> processing a streamed request requires that the ultimate processor requests
> multipar parts in a the correct order to avoid non streaming, so a streaming
> behaviour will not be suitable for most POST requests and can only be used if
> the ultimate Servlet has been written to process a stream rather than a map
> of parameters.
> Both proposals need to identify requests that should be processed as a
> stream. This identification must happen in the headers or URI as any
> identification later than the headers may be too late. Something like a
> custom header (x-uploadmode: stream) or a query string (?uploadmode=stream)
> or possibly a selector (/path/to/target.stream) would work and each have
> advantages and disadvantages.
> h1. Proposal 1
> When a POST request is identified as multipart and streaming, create a
> LazyParameterMap that uses the Commons File Upload Streaming API
> (https://commons.apache.org/proper/commons-fileupload/streaming.html) to
> process the request on demand as parameters are requested. If parameters are
> requested out of sequence, do something sensible attempting to maintain
> streaming behaviour, but if the code really breaks streaming, throw an
> exception to alert servlet developer early.
> h2. Pros
> * Follows a similar pattern to currently using the Servlet API.
> h2. Cons
> * [] params will be hard to support when the [] is out of order, and almost
> impossible if the [] is an upload body.
> * May not work when a request is routed incorrectly as getParameter requests
> will be out of streaming sequence.
> h2. Proposal 2
> When a POST request is identified as multipart and streaming, create a
> NullParameterMap that returns null for all parameter get operations. In
> addition set a request Attribute containing a Iterator<Part> that allows
> access to the request stream in a similar way to the Commons File Upload
> Streaming API. Servlets that process uploads streams will use the
> Iterator<Part> object retrieved from the request. Part is the Servlet 3 Part
> https://tomcat.apache.org/tomcat-7.0-doc/servletapi/javax/servlet/http/Part.html.
> IIUC This API is already used in the Sling Engine and exported by a bundle.
> h2. Pros
> * Won't get broken by existing getParameter calls, which all return null and
> do no harm to the stream.
> * Far simpler implementation as the Servlet implementation has to get the
> request data in streaming order.
> h2. Cons
> * Needs a custom Sling Upload Operation that understand how to process the
> Iterator<Part>
> * Can't use the adaptTo mechanism on the request, as
> request.adaptTo(Iterator.class) doesn't make sense being too generic. Would
> need a new API to make this work. request.adaptTo(PartsIterator.class), which
> PartsIterator extends Iterator.
> * Supporting the full breadth of the Sling Operation protocol in the Sling
> Upload Operation will require wide scale duplication of code from the
> ModifyOperation implementation as the ModifyOperation expects RequestProperty
> maps and wont work with a streamed part.
> * Forces the Sling Post bundle to depend on Servlet 3 to get the Part API,
> requiring some patches to the existing test classes.
> To support both methods a standard Servlet to handle streamed uploads would
> be needed, connecting the file request stream to the Resource output stream.
> In some cases (Oak S3 DS Async Uploads, Mongo DS) this wont entirely
> eliminate local disk IO, although in most cases the Resource output stream
> wrapps the final output stream. To maintain streaming a save operation may
> need to be performed for each upload to cause the request stream to be read.
> If this is a duplicate issue, please link.
> If you have input, please share.
> Have some patches in progress, would prefer Proposal 2, as Proposal 1 looks
> messy at the moment.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)