[
https://issues.apache.org/jira/browse/NIFI-6559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909133#comment-16909133
]
Mark Payne commented on NIFI-6559:
----------------------------------
[~patricker] I don't think this is a change that we want to make. If a
FlowFile's queue is missing, then the FlowFile is dropped from the system.
However, if an Overflow file is missing, it would not drop the FlowFile from
the system - it would simply lose some updates to the FlowFile. Now if this
were the last update to the FlowFile, it may not be a big deal. However, if
it's not the last update to the FlowFile, this can be a very big deal. For
example, consider the following set of updates to the FlowFile Repository, all
for FlowFile A:
* FlowFile A created. Placed in Queue 123.
* Information is extracted from FlowFile and added as attributes. Placed in
Queue 456.
* Processor X removes any PII information from FlowFile Attributes. Placed on
Queue 789. — This is in an overflow file that gets dropped.
* FlowFile routing done and placed on Queue 222.
Now, consider that NiFi is restarted. Upon restart, we have lost the update to
the attributes that removed PII. But we've placed the FlowFile, with PII, on
Queue 222. Upon restart, we've now loaded up this FlowFile into a queue that it
should never have been able to get to without first going through some other
process. Essentially, we have caused the FlowFile to skip some processor in the
flow. This could result in data loss, or it can result in data leakage,
corruption, or several other possible problems.
While it would be acceptable to lose the last update to a FlowFile, which would
result in essentially replaying the FlowFile from some point in the flow, it is
imperative that the FlowFile repository never allow an intermediate update to a
FlowFile to be lost.
> FlowFile Repo Journal Recovery Should not Fail if External Overflow Files are
> Missing
> -------------------------------------------------------------------------------------
>
> Key: NIFI-6559
> URL: https://issues.apache.org/jira/browse/NIFI-6559
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework
> Reporter: Peter Wicks
> Assignee: Peter Wicks
> Priority: Minor
> Time Spent: 10m
> Remaining Estimate: 0h
>
> When NiFi is journaling the FlowFile repository changes to disk it sometimes
> writes Overflow files if it exceeds a certain memory threshold.
> These files are tracked inside of the *.journal files as External File
> References. If one of these external file references is deleted or lost the
> entire journal fails to recover.
> Instead, I feel this should work more like FlowFile's that lose their queue,
> or Content in the Content Repository that has lost it's FlowFile. Log it,
> and move on.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)