Kefevs Pirkibo created NIFI-2489:
------------------------------------
Summary: GetFile inability to remove source file results in
duplicate files (PutFile) and dataloss (Site2Site)
Key: NIFI-2489
URL: https://issues.apache.org/jira/browse/NIFI-2489
Project: Apache NiFi
Issue Type: Bug
Components: Core Framework
Affects Versions: 0.6.1, 0.7.0
Environment: Tested with CentOS 6 and 7.
Cifs-utils 4.8.1-20.el6 (CentOS 6) and Cifs-utils 6.2-7.el7 (CentOS 7)
Windows Server 2003 and Windows Server 2008 as CIFS sources.
Reporter: Kefevs Pirkibo
If GetFile is unable to remove the sourcefile from the windows cifs mapping
(file is locked by another application) it also fails to remove other files
from the same batch. (Unknown why). It then again sources those same file into
NIFI on the next run, and fails to remove again. If the destination is PutFile
with Conflict Resulution Strategy set to 'fail' the failure que builds up in a
alarming rate.
(0.6.1 and 0.7.0 on CentOS 6) if the destination is not a PutFile, but a
Site2Site Output port the files can be dropped due to missing content.
Example log extract: http://pastebin.com/dJ8UibwR
Environment
Have replicated the GetFile behaviour on both CentOS 6 and 7 with CIFS mounts
from a couple different Windows servers afterwards. In the original case
GetFile ran on CRON to source files from the Windows server folders. Timer
driven GetFile is even worse since it builds up duplicates even faster.
Trying to remove the file manually with rm in bash gives:
rm: cannot remove 'Filename': Text file busy
The most troubling here is that one locked file affects many others, depending
on batch size. It is not only the locked file that is affected. The first
occurrence of this with Site2Site dropped 9 files, out of 10, since the last of
those 10 had a lock.
1) What should the expected behavior of GetFile in a edge case where it is
unable to remove a source file? (revert?, remember in state file is read?)
2) Why does delete lock on one file prevent other files in the same batch to be
deleted? (They are loaded into NIFI as flowfiles, but not deleted either.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)