[
https://issues.apache.org/jira/browse/NIFI-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877439#comment-15877439
]
Joe Skora edited comment on NIFI-3332 at 2/22/17 4:15 AM:
----------------------------------------------------------
[~ijokarumawak] Admittedly, discussing time orientation and time references for
these tests can get confusing, I realize now that t3 is older than t2 and so
on, I had mixed it up earlier today.
Table 1 looks mostly right to me. Since it is based on the old logic, the t3
run should output the batch1-age3.txt, batch1-age4.txt, and batch1-age5.txt
files and store a state timestamp of t3 and processed files list containing
just the batch1-age3.txt file since the batch1-age4.txt and batch1-age5.txt
files are older than the state timestamp. The subsequent t2 run looks correct
to me, outputting the batch2-age3.txt and batch2-age2.txt files and storing
state with the t2 timestamp and the batch2-age2.txt file. So the Table 1
output information is correct except for the state file list on the t3 run
containing files not matching the state timestamp.
I'm not sure I understand your "_SUCCESS" algorithm at this point so I can't
comment on Table 2 until I get a chance to work through that tomorrow afternoon.
was (Author: jskora):
[~ijokarumawak] Admittedly, discussing time orientation and time references for
these tests can get confusing, I think I understand now that t3 is older than
t2 and so on.
Table 1 looks mostly right to me. Since it is based on the old logic, the t3
run should output the batch1-age3.txt, batch1-age4.txt, and batch1-age5.txt
files and store a state timestamp of t3 and processed files list containing
just the batch1-age3.txt file since the batch1-age4.txt and batch1-age5.txt
files are older than the state timestamp. The subsequent t2 run looks correct
to me, outputting the batch2-age3.txt and batch2-age2.txt files and storing
state with the t2 timestamp and the batch2-age2.txt file. So the Table 1
output information is correct except for the state file list on the t3 run
containing files not matching the state timestamp.
I'm not sure I understand your "_SUCCESS" algorithm at this point so I can't
comment on Table 2 until I get a chance to work through that tomorrow afternoon.
> Bug in ListXXX causes matching timestamps to be ignored on later runs
> ---------------------------------------------------------------------
>
> Key: NIFI-3332
> URL: https://issues.apache.org/jira/browse/NIFI-3332
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 0.7.1, 1.1.1
> Reporter: Joe Skora
> Assignee: Koji Kawamura
> Priority: Critical
> Attachments: Test-showing-ListFile-timestamp-bug.log,
> Test-showing-ListFile-timestamp-bug.patch
>
>
> The new state implementation for the ListXXX processors based on
> AbstractListProcessor creates a race conditions when processor runs occur
> while a batch of files is being written with the same timestamp.
> The changes to state management dropped tracking of the files processed for a
> given timestamp. Without the record of files processed, the remainder of the
> batch is ignored on the next processor run since their timestamp is not
> greater than the one timestamp stored in processor state. With the file
> tracking it was possible to process files that matched the timestamp exactly
> and exclude the previously processed files.
> A basic time goes as follows.
> T0 - system creates or receives batch of files with Tx timestamp where Tx
> is more than the current timestamp in processor state.
> T1 - system writes 1st half of Tx batch to the ListFile source directory.
> T2 - ListFile runs picking up 1st half of Tx batch and stores Tx timestamp
> in processor state.
> T3 - system writes 2nd half of Tx batch to ListFile source directory.
> T4 - ListFile runs ignoring any files with T <= Tx, eliminating 2nd half Tx
> timestamp batch.
> I've attached a patch[1] for TestListFile.java that adds an instrumented unit
> test demonstrates the problem and a log[2] of the output from one such run.
> The test writes 3 files each in two batches with processor runs after each
> batch. Batch 2 writes files with timestamps older than, equal to, and newer
> than the timestamp stored when batch 1 was processed, but only the newer file
> is picked up. The older file is correctly ignored but file with the matchin
> timestamp file should have been processed.
> [1] Test-showing-ListFile-timestamp-bug.patch
> [2] Test-showing-ListFile-timestamp-bug.log
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)