Hey Joe, Awesome! Glad that you were able to address the issue. I think a contrib for that would be great, if you don't mind. Would be happy to review & merge it.
Thanks -Mark > On Sep 6, 2017, at 3:00 PM, Joe Gresock <[email protected]> wrote: > > Mark, > > I took the second approach, since the nifi-toolkit-flowfile-repo project > doesn't appear to exist at version 1.1.0. I added a line to attempt to get > the next recoverable transaction ID as you suggested, and it started up > successfully! Thanks for your help. > > Is this something that should be contributed, or is it moot with the latest > version? > > Joe > > On Wed, Sep 6, 2017 at 5:18 PM, Joe Gresock <[email protected]> wrote: > >> Thanks Mark, that's the kind of thing I was looking for, this gives me a >> good starting point. >> >> Joe >> >> On Wed, Sep 6, 2017 at 5:09 PM, Mark Payne <[email protected]> wrote: >> >>> Joe, >>> >>> If you wanted to go the route of truncating it, I would recommend >>> starting with the >>> nifi-toolkit-flowfile-repo module and update that. It has the >>> dependencies all already >>> in place to read the repository and update it. You would want to just >>> read each >>> transaction from a partition and write it to a new file until you hit the >>> EOFException >>> and then just discard that transaction. >>> >>> The other option - not assuming that EOFException implies out of data >>> would mean updating >>> MinimalLockingWirteAheadLog (in the nifi-commons/nifi-write-ahead-log >>> module) and then >>> around lines 472-479 updating the logic so that if an Exception is caught >>> there, we call >>> nextPartition.getNextRecoverableTransactionId() again >>> if the partition does actually have more data (may require >>> adding some sort of isRecoveryDataAvailable() method or something >>> like that on the Partition class). >>> >>> Does this help? >>> >>> Thanks >>> -Mark >>> >>> >>> On Sep 6, 2017, at 1:01 PM, Joe Gresock <[email protected]<mailto:jgr >>> [email protected]>> wrote: >>> >>> Sorry, 144 was a typo.. there are 14 files. >>> >>> Yes, it appears to have run out of disk space, so that's probably the root >>> cause. Can you give my any ideas on how to carry out your two ideas? How >>> would I look for the end of a record, so as to truncate it? >>> >>> On Wed, Sep 6, 2017 at 4:55 PM, Mark Payne <[email protected]<mailto:m >>> [email protected]>> wrote: >>> >>> Hmmm ok interesting... once it hits an EOFException it is assuming that >>> there is no more data in the partition. >>> Clearly, there is because it then fails when calling endRecovery(). Did >>> you perhaps run out of disk space on your FlowFile >>> Repo while it was running or hit an OutOfMemoryError? Perhaps that would >>> cause an EOFException and then continue writing. >>> >>> The fact that there are 144 files in that directory is also very odd... >>> there is generally only 1-2 files in that directory. Do all of your >>> partitions have that many files? Any errors before the restart about not >>> being able to checkpoint the FlowFile Repo? >>> >>> At this point, I'm not entirely sure what can be done, other than to >>> perhaps try to manually truncate that last record in the Partition >>> that is causing the EOFException. Or perhaps the >>> MinimalLockingWriteAheadLog could be updated to not assume that >>> EOFException >>> implies that the partition no longer has data in it. Unfortunately, >>> though, I'm not seeing any easy work around. >>> >>> On Sep 6, 2017, at 12:37 PM, Joe Gresock <[email protected]<mailto:jgr >>> [email protected]>> wrote: >>> >>> Yes, I do see: >>> ERROR [main] org.wali.MinimalLockingWriteAheadLog >>> org.wali.MinimalLockingWriteAheadLog@1e620fe7 unexpectedly reached >>> End-of-File when reading from Partition-214 for Transaction ID >>> 1918212626; >>> assuming crash and ignoring this transaction. >>> >>> In that directory, I see 144 files, totalling ~120MB. The first two >>> files >>> are multi-megabyte files, and the other 12 are all either 7K or 4K. >>> >>> On Wed, Sep 6, 2017 at 4:30 PM, Mark Payne <[email protected]<mailto:m >>> [email protected]>> wrote: >>> >>> Joe, >>> >>> Any other errors in the logs? Specifically, looking for errors that >>> contain the text: >>> unexpectedly reached End-of-File when reading from >>> >>> or: >>> unexpectedly found End-of-File when reading from >>> >>> This is not something that I've ever run into personally, but looking >>> through the code, trying >>> to understand what may cause this. >>> >>> Also, if you look at the files in /data/nifi/flowfile_ >>> repository/partition-8, >>> how many files are there in there, and how large are they? >>> >>> Thanks >>> -Mark >>> >>> >>> >>> On Sep 6, 2017, at 12:22 PM, Joe Gresock <[email protected]<mailto:jgr >>> [email protected]><mailto:jgr >>> [email protected]<mailto:[email protected]>>> wrote: >>> >>> 1.1.0, it's not on a system I can copy/paste from, but here's part of >>> the >>> stack trace: >>> >>> at >>> org.wali.MinimalLockingWriteAheadLog$Partition.endRecovery( >>> MinimalLockingWriteAheadLog.java:1047) >>> ~[nifi-write-ahead-log-1.1.0.jar:1.1.0] >>> at >>> org.wali.MinimalLockingWriteAheadLog.recoverFromEdits( >>> MinimalLockingWriteAheadLog.java:487) >>> ~[nifi-write-ahead-log-1.1.0.jar:1.1.0] >>> at >>> org.wali.MinimalLockingWriteAheadLog.recoverRecords( >>> MinimalLockingWriteAheadLog.java:301) >>> ~[nifi-write-ahead-log-1.1.0.jar:1.1.0] >>> >>> On Wed, Sep 6, 2017 at 4:13 PM, Mark Payne <[email protected]<mailto:m >>> [email protected]> >>> <mailto:m >>> [email protected]<mailto:[email protected]>>> wrote: >>> >>> Joe, >>> >>> What version of NiFI are you running? Do you have a stack trace? >>> >>> Thanks >>> -Mark >>> >>> >>> On Sep 6, 2017, at 11:59 AM, Joe Gresock <[email protected]<mailto:jgr >>> [email protected]><mailto:jgr >>> [email protected]<mailto:[email protected]>>> wrote: >>> >>> I'm wondering if there is a way to recover from this scenario: >>> >>> ERROR [main] o.a.nifi.controller.StandardFlowService Failed to load >>> flow >>> from cluster due to: org.apache.nifi.cluster.ConnectionException: >>> Failed to >>> connect node to cluster due to: java.lang.IllegalStateException: >>> Signaled >>> end to recovery, but there are more recovery files for Partition in >>> directory /data/nifi/flowfile_repository/partition-8 >>> >>> I have nearly a TB of files in my content_repository, so I'd really like >>> to >>> be able to salvage this node, but I'm not sure how to proceed, as the >>> node >>> won't start up. >>> >>> -- >>> I know what it is to be in need, and I know what it is to have plenty. >>> I >>> have learned the secret of being content in any and every situation, >>> whether well fed or hungry, whether living in plenty or in want. I can >>> do >>> all this through him who gives me strength. *-Philippians 4:12-13* >>> >>> >>> >>> >>> -- >>> I know what it is to be in need, and I know what it is to have plenty. >>> I >>> have learned the secret of being content in any and every situation, >>> whether well fed or hungry, whether living in plenty or in want. I can >>> do >>> all this through him who gives me strength. *-Philippians 4:12-13* >>> >>> >>> >>> >>> -- >>> I know what it is to be in need, and I know what it is to have plenty. I >>> have learned the secret of being content in any and every situation, >>> whether well fed or hungry, whether living in plenty or in want. I can >>> do >>> all this through him who gives me strength. *-Philippians 4:12-13* >>> >>> >>> >>> >>> -- >>> I know what it is to be in need, and I know what it is to have plenty. I >>> have learned the secret of being content in any and every situation, >>> whether well fed or hungry, whether living in plenty or in want. I can do >>> all this through him who gives me strength. *-Philippians 4:12-13* >>> >>> >> >> >> -- >> I know what it is to be in need, and I know what it is to have plenty. I >> have learned the secret of being content in any and every situation, >> whether well fed or hungry, whether living in plenty or in want. I can >> do all this through him who gives me strength. *-Philippians 4:12-13* >> > > > > -- > I know what it is to be in need, and I know what it is to have plenty. I > have learned the secret of being content in any and every situation, > whether well fed or hungry, whether living in plenty or in want. I can do > all this through him who gives me strength. *-Philippians 4:12-13*
