RAVINARAYAN SINGH created NIFI-15025:
----------------------------------------

             Summary: LookupFailureException with large HTTP responses due to 
BufferedInputStream mark/reset limitation
                 Key: NIFI-15025
                 URL: https://issues.apache.org/jira/browse/NIFI-15025
             Project: Apache NiFi
          Issue Type: Bug
    Affects Versions: 2.6.0
            Reporter: RAVINARAYAN SINGH
            Assignee: RAVINARAYAN SINGH
             Fix For: 2.7.0


When wrapping an HTTP response body stream with BufferedInputStream for large 
files, NiFi record readers can fail with the following error:

 
{code:java}
org.apache.nifi.lookup.LookupFailureException: java.io.IOException: Resetting 
to invalid mark
Caused by: java.io.IOException: Resetting to invalid mark {code}
This happens because BufferedInputStream only supports mark/reset up to its 
internal buffer size (default 8 KB). Once the reader attempts to reset beyond 
that buffer, the stream becomes invalid.

Code from [RestLookupService.java| 
https://github.com/apache/nifi/blob/60330769f668abf963f8a32202841cafa10f1885/nifi-extension-bundles/nifi-standard-services/nifi-lookup-services-bundle/nifi-lookup-services/src/main/java/org/apache/nifi/lookup/RestLookupService.java#L383-L388]

 
{code:java}
final Record record;
try (final InputStream is = responseBody.byteStream();
     final InputStream bufferedIn = new BufferedInputStream(is)) {
    record = handleResponse(bufferedIn, responseBody.contentLength(), context);
} {code}
h3. *Proposed Fix / Solutions*
 # {*}Remove BufferedInputStream{*}{*}{*}

Use responseBody.byteStream() directly if mark/reset is not required by the 
reader.

 # {*}Configurable Buffer Size{*}{*}{*}

Introduce a NiFi property to configure the buffer size for streams that require 
buffering. Default could remain 8 KB, but users may increase it for larger 
payloads.

 # {*}Spooling InputStream Wrapper (Recommended){*}{*}{*}

Provide a robust InputStream wrapper that preserves streaming while supporting 
unlimited mark/reset via spooling to disk:

 

 ** mark() records the current absolute position.

 ** reset() replays bytes starting from the marked position.

 ** Additional bytes beyond the replay window are streamed and spooled 
transparently.

 ** Temporary spool file is automatically deleted on stream close.

 

This ensures NiFi processors handle large HTTP payloads correctly without 
running into mark/reset limits or heap issues.

 





 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to