[
https://issues.apache.org/jira/browse/NIFI-3469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620652#comment-16620652
]
ASF GitHub Bot commented on NIFI-3469:
--------------------------------------
Github user markap14 commented on a diff in the pull request:
https://github.com/apache/nifi/pull/2991#discussion_r218816915
--- Diff:
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpRequest.java
---
@@ -521,161 +553,221 @@ public void onTrigger(final ProcessContext context,
final ProcessSession session
final long start = System.nanoTime();
final HttpServletRequest request = container.getRequest();
- FlowFile flowFile = session.create();
- try (OutputStream flowFileOut = session.write(flowFile)) {
- StreamUtils.copy(request.getInputStream(), flowFileOut);
- } catch (final IOException e) {
- // There may be many reasons which can produce an IOException
on the HTTP stream and in some of them, eg.
- // bad requests, the connection to the client is not closed.
In order to address also these cases, we try
- // and answer with a BAD_REQUEST, which lets the client know
that the request has not been correctly
- // processed and makes it aware that the connection can be
closed.
- getLogger().error("Failed to receive content from HTTP Request
from {} due to {}",
- new Object[]{request.getRemoteAddr(), e});
- session.remove(flowFile);
- try {
- HttpServletResponse response = container.getResponse();
- response.sendError(Status.BAD_REQUEST.getStatusCode());
- response.flushBuffer();
- container.getContext().complete();
- } catch (final IOException ioe) {
- getLogger().warn("Failed to send HTTP response to {} due
to {}",
- new Object[]{request.getRemoteAddr(), ioe});
+ if (!Strings.isNullOrEmpty(request.getContentType()) &&
request.getContentType().contains(MIME_TYPE__MULTIPART_FORM_DATA)) {
+ final long maxRequestSize =
context.getProperty(MAX_REQUEST_SIZE).asLong();
+ request.setAttribute(Request.__MULTIPART_CONFIG_ELEMENT, new
MultipartConfigElement("/tmp", maxRequestSize, maxRequestSize, 0));
--- End diff --
As-is, this is going to buffer the entire request into memory and then
throw an Exception if the request reaches some configured threshold.
Unfortunately, I think this is going to be too limiting for many use cases. We
should not request that the entire file upload fit into java's heap.
I think the best approach (longer term) would be to create our own MIME
Parser (or find one that already exists?) that is capable of processing the
data in a streaming fashion. I presume that Jetty doesn't allow that because
they want to allow you to obtain all Part's and then access them individually.
However, we're not doing that and so their API is rather limiting.
For the shorter term, though, I think it makes more sense to expose each of
these options - a temporary directory to write files to (we cannot use "/tmp"
because it may not exist on many operating systems. Should perhaps default to
the value of System.getProperty("java.io.tmpdir")), max request size (explain
that this is used to prevent Denial of Service attacks that would fill disk
space), and a threshold at which point we would stream the data to the
temporary file and then re-read it (what jetty refers to as the
"fileSizeThreshold", i.e., the last parameter).
While streaming the data to a temporary file on disk is expensive and less
than ideal, I think it's much better to give the user the option of doing that
than to simply reject the request if it is too large.
> Add multipart request support to HandleHttpRequest Processor
> ------------------------------------------------------------
>
> Key: NIFI-3469
> URL: https://issues.apache.org/jira/browse/NIFI-3469
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Reporter: Koji Kawamura
> Assignee: Endre Kovacs
> Priority: Major
>
> Currently, HandleHttpRequest outputs a single FlowFile containing all
> multipart values as following:
> {code}
> --------------------------ef07e8bf36c274d3
> Content-Disposition: form-data; name="p1"
> v1
> --------------------------ef07e8bf36c274d3
> Content-Disposition: form-data; name="p2"
> v2
> --------------------------ef07e8bf36c274d3--
> {code}
> Many users requested adding upload files support to NiFi.
> In order for HandleHttpRequest to support multipart data we need to add
> followings (this is based on a brief researching and can be more complex or
> simple):
> We need to use HttpServletRequest#getParts() as written in this stackoverflow
> thread:
> http://stackoverflow.com/questions/3337056/convenient-way-to-parse-incoming-multipart-form-data-parameters-in-a-servlet
> Also, we probably need a custom MultiPartInputStreamParser implementation.
> Because Jetty's default implementation writes input data to temporary
> directory on file system, instead, we'd like NiFi to write those into output
> FlowFiles content in streaming fashion.
> And we need request size validation checks, threshold for those validation
> should be passed via javax.servlet.MultipartConfigElement.
> Finally, we have to do something with HandleHttpResponse processor.
> Once HandleHttpRequest processor start splitting incoming request into
> multiple output FlowFiles, we need to wait for every fragment to be
> processed, then execute HandleHttpRequest.
> I think Wait/Notify processors (available from next version) will be helpful
> here.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)