[ 
https://issues.apache.org/jira/browse/NIFI-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937140#comment-15937140
 ] 

Mark Payne commented on NIFI-3273:
----------------------------------

The long-term solution for this, I believe, is to catch Exception when we read 
an update from a 'journal' file. We should then read the stream and see if the 
rest of the stream consists solely of NUL bytes. If so, we should throw 
EOFException instead of the Exception that we caught, because this indicates 
that the update was only partially written when the OS died. As a result, we 
should treat it the same as if NiFi were suddenly killed while writing the 
update. This way, the repo will discard this update and it will be equivalent 
to having rolled back the Process Session.

> MinimalLockingWriteAheadLog doesn't properly handle corrupted journals 
> -----------------------------------------------------------------------
>
>                 Key: NIFI-3273
>                 URL: https://issues.apache.org/jira/browse/NIFI-3273
>             Project: Apache NiFi
>          Issue Type: Bug
>            Reporter: Joseph Percivall
>            Assignee: Mark Payne
>            Priority: Critical
>
> When NiFi is running if the system dies abruptly (sudden power loss) without 
> flushing writes then anything that was being written to disk can become 
> corrupted. A ticket for the provenance repository is already created here[1]. 
> The content repo handles this automatically since the content claim won't be 
> valid if it hasn't been written out yet. The database repo is just a cache 
> and is rebuilt anyway. The logs are handled by logback. The flow.xml.gz can 
> be rolled back to one the last archive (manually).
> This ticket is for the MinimalLockingWriteAheadLog which backs the FlowFile 
> repo and local state. Originally brought up here[2] for MiNiFi, it will also 
> affect NiFi.
> One possible solution is to restore transactions up until the corrupted id 
> and then ignore the rest. This could cause state to become out of sync with 
> the processed flowfiles (if FF repo is restored but local state cannot be 
> fully restored) but given the rarity of the event I think it is an 
> appropriate risk to accept.
> The workaround for the FF repo is to set 
> "nifi.flowfile.repository.always.sync" but currently there is no way to set 
> "alway sync" for the local state provider.
> [1] https://issues.apache.org/jira/browse/NIFI-2890
> [2] 
> https://community.hortonworks.com/questions/75280/why-does-my-minifi-flow-fail-to-run-when-turning-o.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to