Joe Skora created NIFI-3332:
-------------------------------
Summary: Bug in ListXXX causes matching timestamps to be ignored
on later runs
Key: NIFI-3332
URL: https://issues.apache.org/jira/browse/NIFI-3332
Project: Apache NiFi
Issue Type: Bug
Components: Core Framework
Affects Versions: 1.1.1, 0.7.1
Reporter: Joe Skora
Priority: Critical
Attachments: Test-showing-ListFile-timestamp-bug.log,
Test-showing-ListFile-timestamp-bug.patch
The new state implementation for the ListXXX processors based on
AbstractListProcessor creates a race conditions when processor runs occur while
a batch of files is being written with the same timestamp.
The changes to state management dropped tracking of the files processed for a
given timestamp. Without the record of files processed, the remainder of the
batch is ignored on the next processor run since their timestamp is not greater
than the one timestamp stored in processor state. With the file tracking it
was possible to process files that matched the timestamp exactly and exclude
the previously processed files.
A basic time goes as follows.
T0 - system creates or receives batch of files with Tx timestamp where Tx is
more than the current timestamp in processor state.
T1 - system writes 1st half of Tx batch to the ListFile source directory.
T2 - ListFile runs picking up 1st half of Tx batch and stores Tx timestamp in
processor state.
T3 - system writes 2nd half of Tx batch to ListFile source directory.
T4 - ListFile runs ignoring any files with T <= Tx, eliminating 2nd half Tx
timestamp batch.
I've attached a patch[1] for TestListFile.java that adds an instrumented unit
test demonstrates the problem and a log[2] of the output from one such run.
The test writes 3 files each in two batches with processor runs after each
batch. Batch 2 writes files with timestamps older than, equal to, and newer
than the timestamp stored when batch 1 was processed, but only the newer file
is picked up. The older file is correctly ignored but file with the matchin
timestamp file should have been processed.
[1] Test-showing-ListFile-timestamp-bug.patch
[2] Test-showing-ListFile-timestamp-bug.log
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)