Github user markap14 commented on a diff in the pull request:
https://github.com/apache/nifi/pull/2991#discussion_r218816915
--- Diff:
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpRequest.java
---
@@ -521,161 +553,221 @@ public void onTrigger(final ProcessContext context,
final ProcessSession session
final long start = System.nanoTime();
final HttpServletRequest request = container.getRequest();
- FlowFile flowFile = session.create();
- try (OutputStream flowFileOut = session.write(flowFile)) {
- StreamUtils.copy(request.getInputStream(), flowFileOut);
- } catch (final IOException e) {
- // There may be many reasons which can produce an IOException
on the HTTP stream and in some of them, eg.
- // bad requests, the connection to the client is not closed.
In order to address also these cases, we try
- // and answer with a BAD_REQUEST, which lets the client know
that the request has not been correctly
- // processed and makes it aware that the connection can be
closed.
- getLogger().error("Failed to receive content from HTTP Request
from {} due to {}",
- new Object[]{request.getRemoteAddr(), e});
- session.remove(flowFile);
- try {
- HttpServletResponse response = container.getResponse();
- response.sendError(Status.BAD_REQUEST.getStatusCode());
- response.flushBuffer();
- container.getContext().complete();
- } catch (final IOException ioe) {
- getLogger().warn("Failed to send HTTP response to {} due
to {}",
- new Object[]{request.getRemoteAddr(), ioe});
+ if (!Strings.isNullOrEmpty(request.getContentType()) &&
request.getContentType().contains(MIME_TYPE__MULTIPART_FORM_DATA)) {
+ final long maxRequestSize =
context.getProperty(MAX_REQUEST_SIZE).asLong();
+ request.setAttribute(Request.__MULTIPART_CONFIG_ELEMENT, new
MultipartConfigElement("/tmp", maxRequestSize, maxRequestSize, 0));
--- End diff --
As-is, this is going to buffer the entire request into memory and then
throw an Exception if the request reaches some configured threshold.
Unfortunately, I think this is going to be too limiting for many use cases. We
should not request that the entire file upload fit into java's heap.
I think the best approach (longer term) would be to create our own MIME
Parser (or find one that already exists?) that is capable of processing the
data in a streaming fashion. I presume that Jetty doesn't allow that because
they want to allow you to obtain all Part's and then access them individually.
However, we're not doing that and so their API is rather limiting.
For the shorter term, though, I think it makes more sense to expose each of
these options - a temporary directory to write files to (we cannot use "/tmp"
because it may not exist on many operating systems. Should perhaps default to
the value of System.getProperty("java.io.tmpdir")), max request size (explain
that this is used to prevent Denial of Service attacks that would fill disk
space), and a threshold at which point we would stream the data to the
temporary file and then re-read it (what jetty refers to as the
"fileSizeThreshold", i.e., the last parameter).
While streaming the data to a temporary file on disk is expensive and less
than ideal, I think it's much better to give the user the option of doing that
than to simply reject the request if it is too large.
---