Lars, First of all thank you very much for reporting this and providing the detail you did.
Mark Payne just replicated the problem it sounds like then rather than emailing he decided to go get coffee :-). We will be working this up for very prompt resolution and it warrants a release in my view. Can you confirm that you are either not using archiving or you are using it and you have more than 50% of space on the partition nifi is running used up? I'll let Mark share the details. Thanks Joe On Wed, Feb 17, 2016 at 4:06 AM, Lars Francke <[email protected]> wrote: > It seems as if the Funnel thing wasn't actually the problem. > > Here's my new timeline: > > 18:14... - Stop Processors > 18:15:40 - Shutdown NiFi (graceful and successful) > 18:28:03 - Starting NiFi which seemingly deletes content > 18:31++ - Add Funnel etc. and start Processors again (so only now do I see > the problem occurring even though it probably would have happened without > it as well) > > I've uploaded the relevant part of the log here < > http://pastebin.com/6XWP5SVF> > > All processors involved are custom processors but they don't do anything > special and have been running for days and survived multiple restarts > already. I can't share code now but if it becomes important I can strip > them to a bare minimum and share. > > So when the failure happened it was even easier: CustomSourceProcessor was > connected to CustomDestinationProcessor via a normal connection. > > Thanks yet again for helping out everyone! > > On Wed, Feb 17, 2016 at 5:04 AM, Aldrin Piri <[email protected]> wrote: > >> Lars, >> >> Are you able to share your flow or a template of it so we can try to >> recreate? >> >> If not, could you give some information as to what it is doing and what >> processors/components are involved. Are there any custom components? >> >> Thanks! >> >> On Tue, Feb 16, 2016 at 10:18 PM, Joe Witt <[email protected]> wrote: >> >> > 'that deletes the original file' >> > >> > True but even then that refers to the original source data and not >> > what it is in the content repository itself. The content repository >> > error that was emitted about missing flow file exception/content not >> > found is for the purpose of signaling data was removed by some process >> > outside of NiFi. >> > >> > Mark Payne: Any ideas? >> > >> > On Tue, Feb 16, 2016 at 10:15 PM, Thad Guidry <[email protected]> >> > wrote: >> > > There's a checkbox option in the FetchFile that deletes the original >> > file. >> > > >> > > >> > >> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/FetchFile.java#L62 >> > > >> > > static final AllowableValue COMPLETION_DELETE = new >> > AllowableValue("Delete >> > > File", "Delete File", "Deletes the original file from the file >> system"); >> > > >> > > >> > > Perhaps its something along those lines, maybe in his other processors >> ? >> > > He mentioned "I also added another processor feeding that same funnel" >> > ... >> > > which processor was that exactly ? >> > > >> > > >> > > Thad >> > > +ThadGuidry <https://www.google.com/+ThadGuidry> >> > > >> > > On Tue, Feb 16, 2016 at 4:35 PM, Lars Francke <[email protected]> >> > > wrote: >> > > >> > >> Thanks for the explanation. >> > >> >> > >> I tried to reproduce but I can't. I also looked through my bash >> history >> > and >> > >> I can't find anything suspicious. I'm pretty sure that nothing deleted >> > >> files in the content_repository that's not NiFi itself. Everything >> else >> > >> (logs etc.) are all untouched and some content files have survived as >> > well. >> > >> A few FlowFiles are being processed successfully and I just checked >> the >> > >> creation date of all files in content_repository. Most of them are >> > "old". >> > >> >> > >> On Tue, Feb 16, 2016 at 11:12 PM, Joe Witt <[email protected]> >> wrote: >> > >> >> > >> > Lars, >> > >> > >> > >> > The information you're providing from the logs is a pretty important >> > >> > bit of debug data. >> > >> > >> > >> > This concept of 'CONTENTMISSING' being recorded into the Flow File >> > >> > Repository is NiFI's way of saying "Hey I knew about this flow file >> > >> > but when I tried to access the content it was no longer in the >> content >> > >> > repository". What I'm suggesting is something outside of NiFi >> itself >> > >> > removed the content. By default, even when you remove content using >> > >> > the NiFi API it isn't actually deleting the content until it has to >> > >> > and it is asynchronous. Even if you had restarted NiFi during this >> I >> > >> > don't see how this could occur. >> > >> > >> > >> > Even if you have some bugs in the custom processor implementations >> the >> > >> > issue you're showing here should not be possible. >> > >> > >> > >> > The only explanation that makes sense to me so far is that the >> content >> > >> > was actually deleted from within the content repository by something >> > >> > other than NiFi. >> > >> > >> > >> > Can you reproduce the issue? >> > >> > >> > >> > Thanks >> > >> > Joe >> > >> > >> > >> > On Tue, Feb 16, 2016 at 4:58 PM, Lars Francke < >> [email protected] >> > > >> > >> > wrote: >> > >> > > Any ideas on how to debug this further? >> > >> > > >> > >> > > I know very little about the internals of NiFi but there are >> > obviously >> > >> > > still references to that content and it shouldn't have been >> deleted. >> > >> Can >> > >> > > you think of a way I could have done this by accident? >> > >> > > >> > >> > > On Tue, Feb 16, 2016 at 10:35 PM, Joe Witt <[email protected]> >> > wrote: >> > >> > > >> > >> > >> Interesting. What that suggests is the content has been removed >> > from >> > >> > >> the content repo itself. >> > >> > >> >> > >> > >> Thanks >> > >> > >> Joe >> > >> > >> >> > >> > >> On Tue, Feb 16, 2016 at 4:15 PM, Lars Francke < >> > [email protected] >> > >> > >> > >> > >> wrote: >> > >> > >> > I attached a debugger and checked a few of those FlowFiles that >> > >> failed >> > >> > >> and >> > >> > >> > searched the logs for those. This is what I found: >> > >> > >> > >> > >> > >> > 2016-02-16 18:28:35,953 INFO [main] >> > >> > >> o.a.n.c.repository.FileSystemRepository >> > >> > >> > Found unknown file >> > >> > >> > >> > /Users/lars/Downloads/nifi-0.5.0/content_repository/103/14556368398 >> > >> > >> > 47-103 (1058303 bytes) in File System Repository; archiving >> file >> > >> > >> > >> > >> > >> > 2016-02-16 18:42:54,840 WARN [Timer-Driven Process Thread-9] >> > >> > >> > o.a.n.c.r.WriteAheadFlowFileRepository Repository Record >> > >> > >> > >> > >> > >> >> > >> > >> > >> >> > >> StandardRepositoryRecord[UpdateType=CONTENTMISSING,Record=StandardFlowFileRecord[uuid=af69ca83-fc03-41f0-91e1-e3d65da54840,claim=StandardContentClaim >> > >> > >> > [resourceClaim=StandardResourceClaim[id=1455636632024-102, >> > >> > >> > container=default, section=102], offset=661978, >> > >> > >> > length=10],offset=0,name=69321836993544,size=10]] is marked to >> be >> > >> > >> aborted; >> > >> > >> > it will be persisted in the FlowFileRepository as a DELETE >> record >> > >> > >> > >> > >> > >> > Now I can't remember having done this but it's entirely >> possible >> > >> that >> > >> > I >> > >> > >> > restarted NiFi prior to my experiment described above. >> > >> > >> > >> > >> > >> > >> > >> > >> > On Tue, Feb 16, 2016 at 9:16 PM, Joe Witt <[email protected]> >> > >> wrote: >> > >> > >> > >> > >> > >> >> Lars, >> > >> > >> >> >> > >> > >> >> Definitely look forward to understanding the mechanics here a >> > bit >> > >> > >> >> better of what you're seeing and if you can provide something >> > >> > >> >> reproducible. Even if you have a custom processor the >> > API/Process >> > >> > >> >> Session construct should protect from many of the things that >> > can >> > >> go >> > >> > >> >> wrong there. Now the content repo will likely be large empty >> as >> > >> the >> > >> > >> >> data represents on 888KB of data and it is probably in a >> > relative >> > >> > >> >> small number of files on disk. >> > >> > >> >> >> > >> > >> >> Thanks >> > >> > >> >> joe >> > >> > >> >> >> > >> > >> >> On Tue, Feb 16, 2016 at 2:57 PM, Lars Francke < >> > >> > [email protected]> >> > >> > >> >> wrote: >> > >> > >> >> > Hi Matt, >> > >> > >> >> > >> > >> > >> >> > thanks for the quick response. It's late here so I'll try >> > >> > reproducing >> > >> > >> >> > tomorrow. >> > >> > >> >> > >> > >> > >> >> > Source and destination processors are custom processors. >> > >> > >> >> > This is Nifi 0.5.0 RC3 >> > >> > >> >> > >> > >> > >> >> > NiFi thinks all FlowFiles are still there: < >> > >> > http://imgur.com/isDlRk4> >> > >> > >> >> > >> > >> > >> >> > I'm looking at logs now no ERRORs or WARN that seem >> > suspicious so >> > >> > far >> > >> > >> >> > >> > >> > >> >> > On Tue, Feb 16, 2016 at 8:46 PM, Matthew Clarke < >> > >> > >> >> [email protected]> >> > >> > >> >> > wrote: >> > >> > >> >> > >> > >> > >> >> >> Lars, >> > >> > >> >> >> What version of NiFi are you running? >> > >> > >> >> >> What type of processor was your source processor? >> > >> > >> >> >> What type of processor was the destination processor? >> > >> > >> >> >> I tried reproducing using a GenerateFlowFile to >> produce >> > >> > ~100k >> > >> > >> >> >> Flowfiles on a connection to an UpdateAttribute processor. >> I >> > >> then >> > >> > >> >> stopped >> > >> > >> >> >> the GenerateFlowFile processor , added a funnel, and moved >> > the >> > >> > >> >> connection. >> > >> > >> >> >> I also added another processor feeding that same funnel and >> > >> routed >> > >> > >> the >> > >> > >> >> >> connection from the funnel back to the UpdateAttribute >> > >> processor. >> > >> > >> The >> > >> > >> >> >> files moved as expected through the funnnel. >> > >> > >> >> >> >> > >> > >> >> >> Can you reproduce? Any other errors in your app log >> > from >> > >> > >> prior >> > >> > >> >> to >> > >> > >> >> >> completing the connection? >> > >> > >> >> >> >> > >> > >> >> >> Thanks, >> > >> > >> >> >> Matt >> > >> > >> >> >> >> > >> > >> >> >> On Tue, Feb 16, 2016 at 1:15 PM, Lars Francke < >> > >> > >> [email protected]> >> > >> > >> >> >> wrote: >> > >> > >> >> >> >> > >> > >> >> >> > Hi, >> > >> > >> >> >> > >> > >> > >> >> >> > I'm trying to understand what happened and how I can >> > prevent >> > >> > this >> > >> > >> in >> > >> > >> >> the >> > >> > >> >> >> > future. >> > >> > >> >> >> > >> > >> > >> >> >> > The outcome seems to be that all my FlowFiles which were >> > >> sitting >> > >> > >> in a >> > >> > >> >> >> > connection have been deleted from disk. >> > >> > >> >> >> > >> > >> > >> >> >> > I had a flow with two processors connected via a single >> > >> > connection. >> > >> > >> >> >> > >> > >> > >> >> >> > What I did: >> > >> > >> >> >> > * Stop all Processors >> > >> > >> >> >> > * Add a Funnel >> > >> > >> >> >> > * Add a Processor >> > >> > >> >> >> > * Move destination end of existing connection to funnel >> > (with >> > >> > the >> > >> > >> >> >> existing >> > >> > >> >> >> > FlowFiles) >> > >> > >> >> >> > * Connect new Processor to Funnel >> > >> > >> >> >> > * Connect Funnel to old destination Processor >> > >> > >> >> >> > >> > >> > >> >> >> > The connection between the Funnel and the Destination >> > >> processor >> > >> > >> still >> > >> > >> >> >> shows >> > >> > >> >> >> > all 90k FlowFiles but the Processor fails on session.read >> > >> with a >> > >> > >> >> >> > MissingFlowFileException. >> > >> > >> >> >> > >> > >> > >> >> >> > Sure enough my content_repository is mostly empty too. >> > >> > >> >> >> > >> > >> > >> >> >> > Now this isn't so bad because it's only a dev environment >> > but >> > >> > I'd >> > >> > >> >> like to >> > >> > >> >> >> > understand how this could happen. Did I do something >> wrong? >> > >> > >> >> >> > >> > >> > >> >> >> > Any hints on what to search for in the logs or which >> place >> > in >> > >> > the >> > >> > >> >> source >> > >> > >> >> >> > code to look? >> > >> > >> >> >> > >> > >> > >> >> >> > Cheers, >> > >> > >> >> >> > Lars >> > >> > >> >> >> > >> > >> > >> >> >> >> > >> > >> >> >> > >> > >> >> > >> > >> > >> >> > >>
