Evert Bevernage created NIFI-6710:
-------------------------------------
Summary: Large queues get stuck with loss of data
Key: NIFI-6710
URL: https://issues.apache.org/jira/browse/NIFI-6710
Project: Apache NiFi
Issue Type: Bug
Affects Versions: 1.9.2
Environment: Windows 10, Windows Server, Single nifi node
Reporter: Evert Bevernage
Attachments: 5 mins after processor stops taking in files.PNG, after
emptyqueue attempt 1.PNG, after emptyqueue attempt 2.png,
after-restart-nifi.PNG, failing-queue.xml, nifi-app.log, putfile-destination
with missing files only 20k of 50kplus.PNG, when processor stops taking in
files.PNG
Hi,
We've observed the following behavior repeatedly:
When a queue is filled with a large number of smaller flowfiles (more than
10000) the processor taking files from the queue will stop taking in new files
at a certain point (not a fixed number) and ignore the remaining files that are
in the queue. There is no way to recover the stuck files in nifi.
Right now we are avoiding a buildup of large queue's but they do happen from
time to time
(especially if a information source recovers from connection issues)
As i can now consistently reproduce the issue, please advise if additional
testing or information is required.
To reproduce (on a windows system):
# Use the attached template (failing-queue.xml)
# Start the Putfile processor
# Start the GenerateFile processor to rapidly fill the queue to the maximum
# Stop the GenerateFile processor when the queue is applying back-pressure
# PutFile will start writing files to disk and stop after a variable number of
files.
# Observe that only a fraction of the generated files is written to disk
# Observe that emptying the queue has not the expected effect
# Observe that a reboot will clear the queue.
Observations:
* Waiting for recovery doesn't work. The queue remains filled with files
* Emptying the queue generally doesn't work. Most of the time I receive the
message that the queue is empty or sometimes only a fraction of the files is
removed
* Rebooting nifi will consistently clear the queue. The flowfiles will however
be lost. (Also when not emptying the queue).
* I have had this issue with files that were processed by processors not
directly reading or writing from disk (e.g. i don't think it is putfile or
list-file/fetchfile or getfile related)
* I'm not 100% sure, but I think that in all observations the problem occurs
when there are 10000+ files in the queue
I've attched the templatefile and screenshots of the operation in nifi and the
app.log ,
For the interpretation of app.log
* GenerateFlowFile id: 016d1000-1a9e-12b8-4bb4-cb34ad6229fe
* Queue id: 016d1000-202e-12b9-9fa6-4b38cadf8463
* PutFile id: 016d1000-408d-12b8-15d9-fea8289e0017
What i've found in the log
* I've started the experiment with generate flowfile 25/9/2019 at 9:24:34
(line 1073)
* stopped the generate flowfile on line 1079
* dropped 5000 flowfiles on line 1115&1116
* tried to drop the remaining files on line 1121 & 1123
* relaunched nifi on line 1138
* line 2152 and 2153 unknown files in the content_repository
--
This message was sent by Atlassian Jira
(v8.3.4#803005)