[
https://issues.apache.org/jira/browse/NIFI-3469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621825#comment-16621825
]
ASF GitHub Bot commented on NIFI-3469:
--------------------------------------
Github user ekovacs commented on a diff in the pull request:
https://github.com/apache/nifi/pull/2991#discussion_r219117891
--- Diff:
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpRequest.java
---
@@ -521,161 +553,221 @@ public void onTrigger(final ProcessContext context,
final ProcessSession session
final long start = System.nanoTime();
final HttpServletRequest request = container.getRequest();
- FlowFile flowFile = session.create();
- try (OutputStream flowFileOut = session.write(flowFile)) {
- StreamUtils.copy(request.getInputStream(), flowFileOut);
- } catch (final IOException e) {
- // There may be many reasons which can produce an IOException
on the HTTP stream and in some of them, eg.
- // bad requests, the connection to the client is not closed.
In order to address also these cases, we try
- // and answer with a BAD_REQUEST, which lets the client know
that the request has not been correctly
- // processed and makes it aware that the connection can be
closed.
- getLogger().error("Failed to receive content from HTTP Request
from {} due to {}",
- new Object[]{request.getRemoteAddr(), e});
- session.remove(flowFile);
- try {
- HttpServletResponse response = container.getResponse();
- response.sendError(Status.BAD_REQUEST.getStatusCode());
- response.flushBuffer();
- container.getContext().complete();
- } catch (final IOException ioe) {
- getLogger().warn("Failed to send HTTP response to {} due
to {}",
- new Object[]{request.getRemoteAddr(), ioe});
+ if (!Strings.isNullOrEmpty(request.getContentType()) &&
request.getContentType().contains(MIME_TYPE__MULTIPART_FORM_DATA)) {
+ final long maxRequestSize =
context.getProperty(MAX_REQUEST_SIZE).asLong();
+ request.setAttribute(Request.__MULTIPART_CONFIG_ELEMENT, new
MultipartConfigElement("/tmp", maxRequestSize, maxRequestSize, 0));
--- End diff --
Hi @markap14
Thank you for reviewing my changes.
Please see my responses inline.
Best regards,
Endre
> As-is, this is going to buffer the entire request into memory and then
throw an Exception if the request reaches some configured threshold.
Unfortunately, I think this is going to be too limiting for many use cases. We
should not request that the entire file upload fit into java's heap.
The original ticket: https://issues.apache.org/jira/browse/NIFI-3469
Explicitly mentioned to avoid writing to local disk, thus i took this
approach. This was achieved by passing 0 as the last parameter for: `new
MultipartConfigElement("/tmp", maxRequestSize, maxRequestSize, 0)`
> I think the best approach (longer term) would be to create our own MIME
Parser (or find one that already exists?) that is capable of processing the
data in a streaming fashion. I presume that Jetty doesn't allow that because
they want to allow you to obtain all Part's and then access them individually.
However, we're not doing that and so their API is rather limiting.
Prior to implementing, I read lots of advices/tips on the web. eg.: warns:
https://stackoverflow.com/questions/2422468/how-to-upload-files-to-server-using-jsp-servlet/2424824#2424824
As this SO post warns:
* _Don't manually parse it_
* _ You shouldn't try to do this on your own or copypaste some homegrown
library-less code found elsewhere on the Internet_
* _You should rather use a real library which is used (and implicitly
tested!) by millions of users for years. Such a library has proven its
robustness._
I believe the return on asset would not be optimal, as it would open up a
world of bugs.
> For the shorter term, though, I think it makes more sense to expose each
of these options - a temporary directory to write files to (we cannot use
"/tmp" because it may not exist on many operating systems. Should perhaps
default to the value of System.getProperty("java.io.tmpdir")), max request size
(explain that this is used to prevent Denial of Service attacks that would fill
disk space), and a threshold at which point we would stream the data to the
temporary file and then re-read it (what jetty refers to as the
"fileSizeThreshold", i.e., the last parameter).
I agree.
> While streaming the data to a temporary file on disk is expensive and
less than ideal, I think it's much better to give the user the option of doing
that than to simply reject the request if it is too large.
Indeed it makes more sense. I'll make the necessary changes for this.
> Add multipart request support to HandleHttpRequest Processor
> ------------------------------------------------------------
>
> Key: NIFI-3469
> URL: https://issues.apache.org/jira/browse/NIFI-3469
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Reporter: Koji Kawamura
> Assignee: Endre Kovacs
> Priority: Major
>
> Currently, HandleHttpRequest outputs a single FlowFile containing all
> multipart values as following:
> {code}
> --------------------------ef07e8bf36c274d3
> Content-Disposition: form-data; name="p1"
> v1
> --------------------------ef07e8bf36c274d3
> Content-Disposition: form-data; name="p2"
> v2
> --------------------------ef07e8bf36c274d3--
> {code}
> Many users requested adding upload files support to NiFi.
> In order for HandleHttpRequest to support multipart data we need to add
> followings (this is based on a brief researching and can be more complex or
> simple):
> We need to use HttpServletRequest#getParts() as written in this stackoverflow
> thread:
> http://stackoverflow.com/questions/3337056/convenient-way-to-parse-incoming-multipart-form-data-parameters-in-a-servlet
> Also, we probably need a custom MultiPartInputStreamParser implementation.
> Because Jetty's default implementation writes input data to temporary
> directory on file system, instead, we'd like NiFi to write those into output
> FlowFiles content in streaming fashion.
> And we need request size validation checks, threshold for those validation
> should be passed via javax.servlet.MultipartConfigElement.
> Finally, we have to do something with HandleHttpResponse processor.
> Once HandleHttpRequest processor start splitting incoming request into
> multiple output FlowFiles, we need to wait for every fragment to be
> processed, then execute HandleHttpRequest.
> I think Wait/Notify processors (available from next version) will be helpful
> here.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)