[jira] [Created] (NIFI-6710) Large queues get stuck with loss of data

Evert Bevernage (Jira) Wed, 25 Sep 2019 01:13:45 -0700

Evert Bevernage created NIFI-6710:
-------------------------------------

             Summary: Large queues get stuck with loss of data
                 Key: NIFI-6710
                 URL: https://issues.apache.org/jira/browse/NIFI-6710
             Project: Apache NiFi
          Issue Type: Bug
    Affects Versions: 1.9.2
         Environment: Windows 10, Windows Server, Single nifi node
            Reporter: Evert Bevernage
         Attachments: 5 mins after processor stops taking in files.PNG, after 
emptyqueue attempt 1.PNG, after emptyqueue attempt 2.png, 
after-restart-nifi.PNG, failing-queue.xml, nifi-app.log, putfile-destination 
with missing files only 20k of 50kplus.PNG, when processor stops taking in 
files.PNG


Hi,

We've observed the following behavior repeatedly:

When a queue is filled with a large number of smaller flowfiles (more than 
10000) the processor taking files from the queue will stop taking in new files 
at a certain point (not a fixed number) and ignore the remaining files that are 
in the queue. There is no way to recover the stuck files in nifi.

Right now we are avoiding a buildup of large queue's but they do happen from 
time to time 
(especially if a information source recovers from connection issues)

As i can now consistently reproduce the issue, please advise if additional 
testing or information is required.

To reproduce (on a windows system): 
 # Use the attached template (failing-queue.xml) 
 # Start the Putfile processor
 # Start the GenerateFile processor to rapidly fill the queue to the maximum
 # Stop the GenerateFile processor  when the queue is applying back-pressure
 # PutFile will start writing files to disk and stop after a variable number of 
files.
 # Observe that only a fraction of the generated files is written to disk
 # Observe that emptying the queue has not the expected effect
 # Observe that a reboot will clear the queue.

Observations:
 * Waiting for recovery doesn't work. The queue remains filled with files
 * Emptying the queue generally doesn't work. Most of the time I receive the 
message that the queue is empty or sometimes only a fraction of the files is 
removed
 * Rebooting nifi will consistently clear the queue. The flowfiles will however 
be lost. (Also when not emptying the queue).
 * I have had this issue with files that were processed by processors not 
directly reading or writing from disk (e.g. i don't think it is putfile  or 
list-file/fetchfile or getfile related)
 * I'm not 100% sure, but I think that in all observations the problem occurs 
when there are 10000+ files in the queue

I've attched the templatefile and screenshots of the operation in nifi and the 
app.log ,

For the interpretation of app.log
 * GenerateFlowFile id: 016d1000-1a9e-12b8-4bb4-cb34ad6229fe
 * Queue id: 016d1000-202e-12b9-9fa6-4b38cadf8463
 * PutFile id: 016d1000-408d-12b8-15d9-fea8289e0017

What i've found in the log
 * I've started the experiment with generate flowfile 25/9/2019 at 9:24:34 
(line 1073)
 * stopped the generate flowfile on line 1079
 * dropped 5000 flowfiles on line 1115&1116
 * tried to drop the remaining files on line 1121 & 1123
 * relaunched nifi on line 1138
 * line 2152 and 2153 unknown files in the content_repository

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (NIFI-6710) Large queues get stuck with loss of data

Reply via email to