[ 
https://issues.apache.org/jira/browse/NIFI-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245597#comment-17245597
 ] 

Tarun Lalu Hasija commented on NIFI-3273:
-----------------------------------------

[~markap14] we are seeing this issue with one of the production nifi nodes 
below is the flowfile configuration

nifi.flowfile.repository.implementation 
org.apache.nifi.controller.repository.WriteAheadFlowFileRepository

nifi.flowfile.repository.wal.implementation 
org.apache.nifi.wali.SequentialAccessWriteAheadLog

 

 
{code:java}
2020-12-07 13:12:06,826 INFO [main] o.a.n.wali.SequentialAccessWriteAheadLog 
Recovering records from Write-A
head Log at /var/lib/nifi/flowfile_repository
2020-12-07 13:12:08,145 INFO [main] org.apache.nifi.wali.HashMapSnapshot 
org.apache.nifi.wali.HashMapSnapshot@6508161b restored 73574 Records and 11 
Swap Files from Snapshot, ending with Transaction ID 23742735983
2020-12-07 13:12:08,147 INFO [main] o.a.n.wali.SequentialAccessWriteAheadLog 
Successfully recovered 73574 re
cords and 11 swap files from Snapshot at 
/var/lib/nifi/flowfile_repository/checkpoint with Max Transaction I
D of 23742735983 in 1319 milliseconds. Now recovering records from 1 journal 
files
2020-12-07 13:12:08,159 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 
Recovering records from journal /va
r/lib/nifi/flowfile_repository/journals/23742735984.journal
2020-12-07 13:12:09,005 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 6.83% 
of the way finished recoverin
g journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, 
having recovered 62730 updates
2020-12-07 13:12:09,971 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 13.68% 
of the way finished recoveri
ng journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, 
having recovered 135556 updates
2020-12-07 13:12:10,864 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 20.60% 
of the way finished recoveri
ng journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, 
having recovered 207179 updates
2020-12-07 13:12:11,933 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 27.45% 
of the way finished recoveri
ng journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, 
having recovered 276497 updates
2020-12-07 13:12:12,171 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 34.35% 
of the way finished recoveri
ng journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, 
having recovered 284820 updates
2020-12-07 13:12:12,445 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 41.45% 
of the way finished recoveri
ng journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, 
having recovered 292318 updates
2020-12-07 13:12:14,111 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 55.18% 
of the way finished recoveri
ng journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, 
having recovered 365112 updates
2020-12-07 13:12:14,512 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 62.44% 
of the way finished recoveri
ng journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, 
having recovered 390673 updates
2020-12-07 13:12:14,960 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 69.30% 
of the way finished recoveri
ng journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, 
having recovered 422410 updates
2020-12-07 13:12:15,585 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 76.20% 
of the way finished recoveri
ng journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, 
having recovered 461458 updates
2020-12-07 13:12:16,000 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 83.11% 
of the way finished recoveri
ng journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, 
having recovered 483167 updates
2020-12-07 13:12:16,854 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 90.05% 
of the way finished recoveri
ng journal /var/lib/nifi/flowfile_repository/journals/23742735984.journal, 
having recovered 542462 updates
2020-12-07 13:12:17,613 INFO [main] o.a.nifi.wali.LengthDelimitedJournal 96.92% 
of the way finished recovering journal 
/var/lib/nifi/flowfile_repository/journals/23742735984.journal, having 
recovered 593333 updates
2020-12-07 13:12:18,038 ERROR [main] o.a.nifi.controller.StandardFlowService 
Failed to load flow from cluster due to: 
org.apache.nifi.cluster.ConnectionException: Failed to connect node to cluster 
due to: java.io.IOException: Expected to read a Sentinel Byte of '1' but got a 
value of '64' instead
org.apache.nifi.cluster.ConnectionException: Failed to connect node to cluster 
due to: java.io.IOException: Expected to read a Sentinel Byte of '1' but got a 
value of '64' instead
{code}
 

 

on running the nifitoolkit flow file repo we are getting the below message, it 
seems its missing the partition-* directories in the flowfile repository

 

 
{code:java}
java -cp 
nifi-toolkit-flowfile-repo-1.9.2.jar:/usr/hdf/current/nifi/lib/:/usr/hdf/current/nifi/ext/:nifi-utils-1.9.2.jar
 org.apache.nifi.toolkit.repos.flowfile.RepairCorruptedFileEndings 
/flowfile_repository_backup/flowfile_repository/journals/ 
/flowfile_repository_backup/repaired_flowfile_repository/
Found no partitions within input Repository Directory 
/flowfile_repository_backup/flowfile_repository/journals

{code}
 

 

> MinimalLockingWriteAheadLog doesn't properly handle corrupted journals 
> -----------------------------------------------------------------------
>
>                 Key: NIFI-3273
>                 URL: https://issues.apache.org/jira/browse/NIFI-3273
>             Project: Apache NiFi
>          Issue Type: Bug
>            Reporter: Joe Percivall
>            Assignee: Joe Witt
>            Priority: Critical
>             Fix For: 1.2.0
>
>
> When NiFi is running if the system dies abruptly (sudden power loss) without 
> flushing writes then anything that was being written to disk can become 
> corrupted. A ticket for the provenance repository is already created here[1]. 
> The content repo handles this automatically since the content claim won't be 
> valid if it hasn't been written out yet. The database repo is just a cache 
> and is rebuilt anyway. The logs are handled by logback. The flow.xml.gz can 
> be rolled back to one the last archive (manually).
> This ticket is for the MinimalLockingWriteAheadLog which backs the FlowFile 
> repo and local state. Originally brought up here[2] for MiNiFi, it will also 
> affect NiFi.
> One possible solution is to restore transactions up until the corrupted id 
> and then ignore the rest. This could cause state to become out of sync with 
> the processed flowfiles (if FF repo is restored but local state cannot be 
> fully restored) but given the rarity of the event I think it is an 
> appropriate risk to accept.
> The workaround for the FF repo is to set 
> "nifi.flowfile.repository.always.sync" but currently there is no way to set 
> "alway sync" for the local state provider.
> [1] https://issues.apache.org/jira/browse/NIFI-2890
> [2] 
> https://community.hortonworks.com/questions/75280/why-does-my-minifi-flow-fail-to-run-when-turning-o.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to