Mark, Joe, thanks a lot for investigating, reproducing and explaining what's going on, I couldn't have done that.
I built the 0.5.0 RC3 version from the git commit mentioned and haven't changed a single configuration property. I hadn't even heard of archiving so far. Just looked at the properties to be safe and archiving is enabled and threshold is set to 50%. My local disk is definitely more than 50% utilised. Had the archive worked would it have been a manual process to recover the data? It looks like you've got it covered but if there's anything else I can do to help let me know. Lars On Wed, Feb 17, 2016 at 8:16 PM, Mark Payne <[email protected]> wrote: > Lars, > > Joe is correct - i felt a need for coffee before trying to identify the > exact case > that will cause this issue to occur :) > > I created a ticket for this issue here: > https://issues.apache.org/jira/browse/NIFI-1527 < > https://issues.apache.org/jira/browse/NIFI-1527> > > I share your opinion that the funnel is not related. Rather, I believe the > issue has to do > with swapping of data. NiFi has a mechanism for swapping out FlowFiles to > disk, rather > than leaving them all in memory (here we are talking about the FlowFile > objects, which > contains things of attributes. The content is already written to disk and > not kept in memory). > > With the default settings, this happens if at least 30,000 FlowFiles exist > in the same queue. > In this case, upon restart, if all of the FlowFiles that reference a > specific "content file" on disk > are swapped out, the Content Repository will end up removing or archiving > that data. If the > archive is disabled or full, the repository will end up removing it. I > believe this is what you saw. > I believe the content was removed on restart because all FlowFiles that > referenced it were > swapped out. > > Please confirm what Joe asked below: that you either disabled archiving or > that your disk is > at least 50% full (or that you changed that configuration parameter in > conf/nifi.properties). > > We will certainly be addressing this issue very promptly. A huge thank you > for noticing the weirdness > and bringing it to our attention and for providing such great details! > > -Mark > > > > On Feb 17, 2016, at 1:48 PM, Joe Witt <[email protected]> wrote: > > > > Lars, > > > > First of all thank you very much for reporting this and providing the > > detail you did. > > > > Mark Payne just replicated the problem it sounds like then rather than > > emailing he decided to go get coffee :-). > > > > We will be working this up for very prompt resolution and it warrants > > a release in my view. > > > > Can you confirm that you are either not using archiving or you are > > using it and you have more than 50% of space on the partition nifi is > > running used up? > > > > I'll let Mark share the details. > > > > Thanks > > Joe > > > > On Wed, Feb 17, 2016 at 4:06 AM, Lars Francke <[email protected]> > wrote: > >> It seems as if the Funnel thing wasn't actually the problem. > >> > >> Here's my new timeline: > >> > >> 18:14... - Stop Processors > >> 18:15:40 - Shutdown NiFi (graceful and successful) > >> 18:28:03 - Starting NiFi which seemingly deletes content > >> 18:31++ - Add Funnel etc. and start Processors again (so only now do I > see > >> the problem occurring even though it probably would have happened > without > >> it as well) > >> > >> I've uploaded the relevant part of the log here < > >> http://pastebin.com/6XWP5SVF> > >> > >> All processors involved are custom processors but they don't do anything > >> special and have been running for days and survived multiple restarts > >> already. I can't share code now but if it becomes important I can strip > >> them to a bare minimum and share. > >> > >> So when the failure happened it was even easier: CustomSourceProcessor > was > >> connected to CustomDestinationProcessor via a normal connection. > >> > >> Thanks yet again for helping out everyone! > >> > >> On Wed, Feb 17, 2016 at 5:04 AM, Aldrin Piri <[email protected]> > wrote: > >> > >>> Lars, > >>> > >>> Are you able to share your flow or a template of it so we can try to > >>> recreate? > >>> > >>> If not, could you give some information as to what it is doing and what > >>> processors/components are involved. Are there any custom components? > >>> > >>> Thanks! > >>> > >>> On Tue, Feb 16, 2016 at 10:18 PM, Joe Witt <[email protected]> wrote: > >>> > >>>> 'that deletes the original file' > >>>> > >>>> True but even then that refers to the original source data and not > >>>> what it is in the content repository itself. The content repository > >>>> error that was emitted about missing flow file exception/content not > >>>> found is for the purpose of signaling data was removed by some process > >>>> outside of NiFi. > >>>> > >>>> Mark Payne: Any ideas? > >>>> > >>>> On Tue, Feb 16, 2016 at 10:15 PM, Thad Guidry <[email protected]> > >>>> wrote: > >>>>> There's a checkbox option in the FetchFile that deletes the original > >>>> file. > >>>>> > >>>>> > >>>> > >>> > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/FetchFile.java#L62 > >>>>> > >>>>> static final AllowableValue COMPLETION_DELETE = new > >>>> AllowableValue("Delete > >>>>> File", "Delete File", "Deletes the original file from the file > >>> system"); > >>>>> > >>>>> > >>>>> Perhaps its something along those lines, maybe in his other > processors > >>> ? > >>>>> He mentioned "I also added another processor feeding that same > funnel" > >>>> ... > >>>>> which processor was that exactly ? > >>>>> > >>>>> > >>>>> Thad > >>>>> +ThadGuidry <https://www.google.com/+ThadGuidry> > >>>>> > >>>>> On Tue, Feb 16, 2016 at 4:35 PM, Lars Francke < > [email protected]> > >>>>> wrote: > >>>>> > >>>>>> Thanks for the explanation. > >>>>>> > >>>>>> I tried to reproduce but I can't. I also looked through my bash > >>> history > >>>> and > >>>>>> I can't find anything suspicious. I'm pretty sure that nothing > deleted > >>>>>> files in the content_repository that's not NiFi itself. Everything > >>> else > >>>>>> (logs etc.) are all untouched and some content files have survived > as > >>>> well. > >>>>>> A few FlowFiles are being processed successfully and I just checked > >>> the > >>>>>> creation date of all files in content_repository. Most of them are > >>>> "old". > >>>>>> > >>>>>> On Tue, Feb 16, 2016 at 11:12 PM, Joe Witt <[email protected]> > >>> wrote: > >>>>>> > >>>>>>> Lars, > >>>>>>> > >>>>>>> The information you're providing from the logs is a pretty > important > >>>>>>> bit of debug data. > >>>>>>> > >>>>>>> This concept of 'CONTENTMISSING' being recorded into the Flow File > >>>>>>> Repository is NiFI's way of saying "Hey I knew about this flow file > >>>>>>> but when I tried to access the content it was no longer in the > >>> content > >>>>>>> repository". What I'm suggesting is something outside of NiFi > >>> itself > >>>>>>> removed the content. By default, even when you remove content > using > >>>>>>> the NiFi API it isn't actually deleting the content until it has to > >>>>>>> and it is asynchronous. Even if you had restarted NiFi during this > >>> I > >>>>>>> don't see how this could occur. > >>>>>>> > >>>>>>> Even if you have some bugs in the custom processor implementations > >>> the > >>>>>>> issue you're showing here should not be possible. > >>>>>>> > >>>>>>> The only explanation that makes sense to me so far is that the > >>> content > >>>>>>> was actually deleted from within the content repository by > something > >>>>>>> other than NiFi. > >>>>>>> > >>>>>>> Can you reproduce the issue? > >>>>>>> > >>>>>>> Thanks > >>>>>>> Joe > >>>>>>> > >>>>>>> On Tue, Feb 16, 2016 at 4:58 PM, Lars Francke < > >>> [email protected] > >>>>> > >>>>>>> wrote: > >>>>>>>> Any ideas on how to debug this further? > >>>>>>>> > >>>>>>>> I know very little about the internals of NiFi but there are > >>>> obviously > >>>>>>>> still references to that content and it shouldn't have been > >>> deleted. > >>>>>> Can > >>>>>>>> you think of a way I could have done this by accident? > >>>>>>>> > >>>>>>>> On Tue, Feb 16, 2016 at 10:35 PM, Joe Witt <[email protected]> > >>>> wrote: > >>>>>>>> > >>>>>>>>> Interesting. What that suggests is the content has been removed > >>>> from > >>>>>>>>> the content repo itself. > >>>>>>>>> > >>>>>>>>> Thanks > >>>>>>>>> Joe > >>>>>>>>> > >>>>>>>>> On Tue, Feb 16, 2016 at 4:15 PM, Lars Francke < > >>>> [email protected] > >>>>>>> > >>>>>>>>> wrote: > >>>>>>>>>> I attached a debugger and checked a few of those FlowFiles that > >>>>>> failed > >>>>>>>>> and > >>>>>>>>>> searched the logs for those. This is what I found: > >>>>>>>>>> > >>>>>>>>>> 2016-02-16 18:28:35,953 INFO [main] > >>>>>>>>> o.a.n.c.repository.FileSystemRepository > >>>>>>>>>> Found unknown file > >>>>>>>>>> > >>>> /Users/lars/Downloads/nifi-0.5.0/content_repository/103/14556368398 > >>>>>>>>>> 47-103 (1058303 bytes) in File System Repository; archiving > >>> file > >>>>>>>>>> > >>>>>>>>>> 2016-02-16 18:42:54,840 WARN [Timer-Driven Process Thread-9] > >>>>>>>>>> o.a.n.c.r.WriteAheadFlowFileRepository Repository Record > >>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>> > >>>> > >>> > StandardRepositoryRecord[UpdateType=CONTENTMISSING,Record=StandardFlowFileRecord[uuid=af69ca83-fc03-41f0-91e1-e3d65da54840,claim=StandardContentClaim > >>>>>>>>>> [resourceClaim=StandardResourceClaim[id=1455636632024-102, > >>>>>>>>>> container=default, section=102], offset=661978, > >>>>>>>>>> length=10],offset=0,name=69321836993544,size=10]] is marked to > >>> be > >>>>>>>>> aborted; > >>>>>>>>>> it will be persisted in the FlowFileRepository as a DELETE > >>> record > >>>>>>>>>> > >>>>>>>>>> Now I can't remember having done this but it's entirely > >>> possible > >>>>>> that > >>>>>>> I > >>>>>>>>>> restarted NiFi prior to my experiment described above. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Tue, Feb 16, 2016 at 9:16 PM, Joe Witt <[email protected]> > >>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Lars, > >>>>>>>>>>> > >>>>>>>>>>> Definitely look forward to understanding the mechanics here a > >>>> bit > >>>>>>>>>>> better of what you're seeing and if you can provide something > >>>>>>>>>>> reproducible. Even if you have a custom processor the > >>>> API/Process > >>>>>>>>>>> Session construct should protect from many of the things that > >>>> can > >>>>>> go > >>>>>>>>>>> wrong there. Now the content repo will likely be large empty > >>> as > >>>>>> the > >>>>>>>>>>> data represents on 888KB of data and it is probably in a > >>>> relative > >>>>>>>>>>> small number of files on disk. > >>>>>>>>>>> > >>>>>>>>>>> Thanks > >>>>>>>>>>> joe > >>>>>>>>>>> > >>>>>>>>>>> On Tue, Feb 16, 2016 at 2:57 PM, Lars Francke < > >>>>>>> [email protected]> > >>>>>>>>>>> wrote: > >>>>>>>>>>>> Hi Matt, > >>>>>>>>>>>> > >>>>>>>>>>>> thanks for the quick response. It's late here so I'll try > >>>>>>> reproducing > >>>>>>>>>>>> tomorrow. > >>>>>>>>>>>> > >>>>>>>>>>>> Source and destination processors are custom processors. > >>>>>>>>>>>> This is Nifi 0.5.0 RC3 > >>>>>>>>>>>> > >>>>>>>>>>>> NiFi thinks all FlowFiles are still there: < > >>>>>>> http://imgur.com/isDlRk4> > >>>>>>>>>>>> > >>>>>>>>>>>> I'm looking at logs now no ERRORs or WARN that seem > >>>> suspicious so > >>>>>>> far > >>>>>>>>>>>> > >>>>>>>>>>>> On Tue, Feb 16, 2016 at 8:46 PM, Matthew Clarke < > >>>>>>>>>>> [email protected]> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Lars, > >>>>>>>>>>>>> What version of NiFi are you running? > >>>>>>>>>>>>> What type of processor was your source processor? > >>>>>>>>>>>>> What type of processor was the destination processor? > >>>>>>>>>>>>> I tried reproducing using a GenerateFlowFile to > >>> produce > >>>>>>> ~100k > >>>>>>>>>>>>> Flowfiles on a connection to an UpdateAttribute processor. > >>> I > >>>>>> then > >>>>>>>>>>> stopped > >>>>>>>>>>>>> the GenerateFlowFile processor , added a funnel, and moved > >>>> the > >>>>>>>>>>> connection. > >>>>>>>>>>>>> I also added another processor feeding that same funnel and > >>>>>> routed > >>>>>>>>> the > >>>>>>>>>>>>> connection from the funnel back to the UpdateAttribute > >>>>>> processor. > >>>>>>>>> The > >>>>>>>>>>>>> files moved as expected through the funnnel. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Can you reproduce? Any other errors in your app log > >>>> from > >>>>>>>>> prior > >>>>>>>>>>> to > >>>>>>>>>>>>> completing the connection? > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>> Matt > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Tue, Feb 16, 2016 at 1:15 PM, Lars Francke < > >>>>>>>>> [email protected]> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I'm trying to understand what happened and how I can > >>>> prevent > >>>>>>> this > >>>>>>>>> in > >>>>>>>>>>> the > >>>>>>>>>>>>>> future. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The outcome seems to be that all my FlowFiles which were > >>>>>> sitting > >>>>>>>>> in a > >>>>>>>>>>>>>> connection have been deleted from disk. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I had a flow with two processors connected via a single > >>>>>>> connection. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> What I did: > >>>>>>>>>>>>>> * Stop all Processors > >>>>>>>>>>>>>> * Add a Funnel > >>>>>>>>>>>>>> * Add a Processor > >>>>>>>>>>>>>> * Move destination end of existing connection to funnel > >>>> (with > >>>>>>> the > >>>>>>>>>>>>> existing > >>>>>>>>>>>>>> FlowFiles) > >>>>>>>>>>>>>> * Connect new Processor to Funnel > >>>>>>>>>>>>>> * Connect Funnel to old destination Processor > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The connection between the Funnel and the Destination > >>>>>> processor > >>>>>>>>> still > >>>>>>>>>>>>> shows > >>>>>>>>>>>>>> all 90k FlowFiles but the Processor fails on session.read > >>>>>> with a > >>>>>>>>>>>>>> MissingFlowFileException. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Sure enough my content_repository is mostly empty too. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Now this isn't so bad because it's only a dev environment > >>>> but > >>>>>>> I'd > >>>>>>>>>>> like to > >>>>>>>>>>>>>> understand how this could happen. Did I do something > >>> wrong? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Any hints on what to search for in the logs or which > >>> place > >>>> in > >>>>>>> the > >>>>>>>>>>> source > >>>>>>>>>>>>>> code to look? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>> Lars > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>> > >>>> > >>> > >
