Kefevs Pirkibo created NIFI-2489:
------------------------------------

             Summary: GetFile inability to remove source file results in 
duplicate files (PutFile) and dataloss (Site2Site)
                 Key: NIFI-2489
                 URL: https://issues.apache.org/jira/browse/NIFI-2489
             Project: Apache NiFi
          Issue Type: Bug
          Components: Core Framework
    Affects Versions: 0.6.1, 0.7.0
         Environment: Tested with CentOS 6 and 7.
Cifs-utils 4.8.1-20.el6 (CentOS 6) and Cifs-utils 6.2-7.el7 (CentOS 7)
Windows Server 2003 and Windows Server 2008 as CIFS sources.
            Reporter: Kefevs Pirkibo


If GetFile is unable to remove the sourcefile from the windows cifs mapping 
(file is locked by another application) it also fails to remove other files 
from the same batch. (Unknown why). It then again sources those same file into 
NIFI on the next run, and fails to remove again. If the destination is PutFile 
with Conflict Resulution Strategy set to 'fail' the failure que builds up in a 
alarming rate. 

(0.6.1 and 0.7.0 on CentOS 6) if the destination is not a PutFile, but a 
Site2Site Output port the files can be dropped due to missing content.
Example log extract: http://pastebin.com/dJ8UibwR

Environment
Have replicated the GetFile behaviour on both CentOS 6 and 7 with CIFS mounts 
from a couple different Windows servers afterwards. In the original case 
GetFile ran on CRON to source files from the Windows server folders.  Timer 
driven GetFile is even worse since it builds up duplicates even faster. 

Trying to remove the file manually with rm in bash gives:
rm: cannot remove 'Filename': Text file busy

The most troubling here is that one locked file affects many others, depending 
on batch size. It is not only the locked file that is affected. The first 
occurrence of this with Site2Site dropped 9 files, out of 10, since the last of 
those 10 had a lock.

1) What should the expected behavior of GetFile in a edge case where it is 
unable to remove a source file? (revert?, remember in state file is read?)
2) Why does delete lock on one file prevent other files in the same batch to be 
deleted? (They are loaded into NIFI as flowfiles, but not deleted either.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to