Mark, Joe,

thanks a lot for investigating, reproducing and explaining what's going on,
I couldn't have done that.

I built the 0.5.0 RC3 version from the git commit mentioned and haven't
changed a single configuration property. I hadn't even heard of archiving
so far. Just looked at the properties to be safe and archiving is enabled
and threshold is set to 50%. My local disk is definitely more than 50%
utilised.

Had the archive worked would it have been a manual process to recover the
data?

It looks like you've got it covered but if there's anything else I can do
to help let me know.

Lars

On Wed, Feb 17, 2016 at 8:16 PM, Mark Payne <[email protected]> wrote:

> Lars,
>
> Joe is correct - i felt a need for coffee before trying to identify the
> exact case
> that will cause this issue to occur :)
>
> I created a ticket for this issue here:
> https://issues.apache.org/jira/browse/NIFI-1527 <
> https://issues.apache.org/jira/browse/NIFI-1527>
>
> I share your opinion that the funnel is not related. Rather, I believe the
> issue has to do
> with swapping of data. NiFi has a mechanism for swapping out FlowFiles to
> disk, rather
> than leaving them all in memory (here we are talking about the FlowFile
> objects, which
> contains things of attributes. The content is already written to disk and
> not kept in memory).
>
> With the default settings, this happens if at least 30,000 FlowFiles exist
> in the same queue.
> In this case, upon restart, if all of the FlowFiles that reference a
> specific "content file" on disk
> are swapped out, the Content Repository will end up removing or archiving
> that data. If the
> archive is disabled or full, the repository will end up removing it. I
> believe this is what you saw.
> I believe the content was removed on restart because all FlowFiles that
> referenced it were
> swapped out.
>
> Please confirm what Joe asked below: that you either disabled archiving or
> that your disk is
> at least 50% full (or that you changed that configuration parameter in
> conf/nifi.properties).
>
> We will certainly be addressing this issue very promptly. A huge thank you
> for noticing the weirdness
> and bringing it to our attention and for providing such great details!
>
> -Mark
>
>
> > On Feb 17, 2016, at 1:48 PM, Joe Witt <[email protected]> wrote:
> >
> > Lars,
> >
> > First of all thank you very much for reporting this and providing the
> > detail you did.
> >
> > Mark Payne just replicated the problem it sounds like then rather than
> > emailing he decided to go get coffee :-).
> >
> > We will be working this up for very prompt resolution and it warrants
> > a release in my view.
> >
> > Can you confirm that you are either not using archiving or you are
> > using it and you have more than 50% of space on the partition nifi is
> > running used up?
> >
> > I'll let Mark share the details.
> >
> > Thanks
> > Joe
> >
> > On Wed, Feb 17, 2016 at 4:06 AM, Lars Francke <[email protected]>
> wrote:
> >> It seems as if the Funnel thing wasn't actually the problem.
> >>
> >> Here's my new timeline:
> >>
> >> 18:14... - Stop Processors
> >> 18:15:40 - Shutdown NiFi (graceful and successful)
> >> 18:28:03 - Starting NiFi which seemingly deletes content
> >> 18:31++ - Add Funnel etc. and start Processors again (so only now do I
> see
> >> the problem occurring even though it probably would have happened
> without
> >> it as well)
> >>
> >> I've uploaded the relevant part of the log here <
> >> http://pastebin.com/6XWP5SVF>
> >>
> >> All processors involved are custom processors but they don't do anything
> >> special and have been running for days and survived multiple restarts
> >> already. I can't share code now but if it becomes important I can strip
> >> them to a bare minimum and share.
> >>
> >> So when the failure happened it was even easier: CustomSourceProcessor
> was
> >> connected to CustomDestinationProcessor via a normal connection.
> >>
> >> Thanks yet again for helping out everyone!
> >>
> >> On Wed, Feb 17, 2016 at 5:04 AM, Aldrin Piri <[email protected]>
> wrote:
> >>
> >>> Lars,
> >>>
> >>> Are you able to share your flow or a template of it so we can try to
> >>> recreate?
> >>>
> >>> If not, could you give some information as to what it is doing and what
> >>> processors/components are involved.  Are there any custom components?
> >>>
> >>> Thanks!
> >>>
> >>> On Tue, Feb 16, 2016 at 10:18 PM, Joe Witt <[email protected]> wrote:
> >>>
> >>>> 'that deletes the original file'
> >>>>
> >>>> True but even then that refers to the original source data and not
> >>>> what it is in the content repository itself.  The content repository
> >>>> error that was emitted about missing flow file exception/content not
> >>>> found is for the purpose of signaling data was removed by some process
> >>>> outside of NiFi.
> >>>>
> >>>> Mark Payne: Any ideas?
> >>>>
> >>>> On Tue, Feb 16, 2016 at 10:15 PM, Thad Guidry <[email protected]>
> >>>> wrote:
> >>>>> There's a checkbox option in the FetchFile that deletes the original
> >>>> file.
> >>>>>
> >>>>>
> >>>>
> >>>
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/FetchFile.java#L62
> >>>>>
> >>>>> static final AllowableValue COMPLETION_DELETE = new
> >>>> AllowableValue("Delete
> >>>>> File", "Delete File", "Deletes the original file from the file
> >>> system");
> >>>>>
> >>>>>
> >>>>> Perhaps its something along those lines, maybe in his other
> processors
> >>> ?
> >>>>> He mentioned "I also added another processor feeding that same
> funnel"
> >>>> ...
> >>>>> which processor was that exactly ?
> >>>>>
> >>>>>
> >>>>> Thad
> >>>>> +ThadGuidry <https://www.google.com/+ThadGuidry>
> >>>>>
> >>>>> On Tue, Feb 16, 2016 at 4:35 PM, Lars Francke <
> [email protected]>
> >>>>> wrote:
> >>>>>
> >>>>>> Thanks for the explanation.
> >>>>>>
> >>>>>> I tried to reproduce but I can't. I also looked through my bash
> >>> history
> >>>> and
> >>>>>> I can't find anything suspicious. I'm pretty sure that nothing
> deleted
> >>>>>> files in the content_repository that's not NiFi itself. Everything
> >>> else
> >>>>>> (logs etc.) are all untouched and some content files have survived
> as
> >>>> well.
> >>>>>> A few FlowFiles are being processed successfully and I just checked
> >>> the
> >>>>>> creation date of all files in content_repository. Most of them are
> >>>> "old".
> >>>>>>
> >>>>>> On Tue, Feb 16, 2016 at 11:12 PM, Joe Witt <[email protected]>
> >>> wrote:
> >>>>>>
> >>>>>>> Lars,
> >>>>>>>
> >>>>>>> The information you're providing from the logs is a pretty
> important
> >>>>>>> bit of debug data.
> >>>>>>>
> >>>>>>> This concept of 'CONTENTMISSING' being recorded into the Flow File
> >>>>>>> Repository is NiFI's way of saying "Hey I knew about this flow file
> >>>>>>> but when I tried to access the content it was no longer in the
> >>> content
> >>>>>>> repository".  What I'm suggesting is something outside of NiFi
> >>> itself
> >>>>>>> removed the content.  By default, even when you remove content
> using
> >>>>>>> the NiFi API it isn't actually deleting the content until it has to
> >>>>>>> and it is asynchronous.  Even if you had restarted NiFi during this
> >>> I
> >>>>>>> don't see how this could occur.
> >>>>>>>
> >>>>>>> Even if you have some bugs in the custom processor implementations
> >>> the
> >>>>>>> issue you're showing here should not be possible.
> >>>>>>>
> >>>>>>> The only explanation that makes sense to me so far is that the
> >>> content
> >>>>>>> was actually deleted from within the content repository by
> something
> >>>>>>> other than NiFi.
> >>>>>>>
> >>>>>>> Can you reproduce the issue?
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>> Joe
> >>>>>>>
> >>>>>>> On Tue, Feb 16, 2016 at 4:58 PM, Lars Francke <
> >>> [email protected]
> >>>>>
> >>>>>>> wrote:
> >>>>>>>> Any ideas on how to debug this further?
> >>>>>>>>
> >>>>>>>> I know very little about the internals of NiFi but there are
> >>>> obviously
> >>>>>>>> still references to that content and it shouldn't have been
> >>> deleted.
> >>>>>> Can
> >>>>>>>> you think of a way I could have done this by accident?
> >>>>>>>>
> >>>>>>>> On Tue, Feb 16, 2016 at 10:35 PM, Joe Witt <[email protected]>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>>> Interesting.  What that suggests is the content has been removed
> >>>> from
> >>>>>>>>> the content repo itself.
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>> Joe
> >>>>>>>>>
> >>>>>>>>> On Tue, Feb 16, 2016 at 4:15 PM, Lars Francke <
> >>>> [email protected]
> >>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>> I attached a debugger and checked a few of those FlowFiles that
> >>>>>> failed
> >>>>>>>>> and
> >>>>>>>>>> searched the logs for those. This is what I found:
> >>>>>>>>>>
> >>>>>>>>>> 2016-02-16 18:28:35,953 INFO [main]
> >>>>>>>>> o.a.n.c.repository.FileSystemRepository
> >>>>>>>>>> Found unknown file
> >>>>>>>>>>
> >>>> /Users/lars/Downloads/nifi-0.5.0/content_repository/103/14556368398
> >>>>>>>>>> 47-103 (1058303 bytes) in File System Repository; archiving
> >>> file
> >>>>>>>>>>
> >>>>>>>>>> 2016-02-16 18:42:54,840 WARN [Timer-Driven Process Thread-9]
> >>>>>>>>>> o.a.n.c.r.WriteAheadFlowFileRepository Repository Record
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>
> StandardRepositoryRecord[UpdateType=CONTENTMISSING,Record=StandardFlowFileRecord[uuid=af69ca83-fc03-41f0-91e1-e3d65da54840,claim=StandardContentClaim
> >>>>>>>>>> [resourceClaim=StandardResourceClaim[id=1455636632024-102,
> >>>>>>>>>> container=default, section=102], offset=661978,
> >>>>>>>>>> length=10],offset=0,name=69321836993544,size=10]] is marked to
> >>> be
> >>>>>>>>> aborted;
> >>>>>>>>>> it will be persisted in the FlowFileRepository as a DELETE
> >>> record
> >>>>>>>>>>
> >>>>>>>>>> Now I can't remember having done this but it's entirely
> >>> possible
> >>>>>> that
> >>>>>>> I
> >>>>>>>>>> restarted NiFi prior to my experiment described above.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Feb 16, 2016 at 9:16 PM, Joe Witt <[email protected]>
> >>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Lars,
> >>>>>>>>>>>
> >>>>>>>>>>> Definitely look forward to understanding the mechanics here a
> >>>> bit
> >>>>>>>>>>> better of what you're seeing and if you can provide something
> >>>>>>>>>>> reproducible.  Even if you have a custom processor the
> >>>> API/Process
> >>>>>>>>>>> Session construct should protect from many of the things that
> >>>> can
> >>>>>> go
> >>>>>>>>>>> wrong there.  Now the content repo will likely be large empty
> >>> as
> >>>>>> the
> >>>>>>>>>>> data represents on 888KB of data and it is probably in a
> >>>> relative
> >>>>>>>>>>> small number of files on disk.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks
> >>>>>>>>>>> joe
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, Feb 16, 2016 at 2:57 PM, Lars Francke <
> >>>>>>> [email protected]>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>> Hi Matt,
> >>>>>>>>>>>>
> >>>>>>>>>>>> thanks for the quick response. It's late here so I'll try
> >>>>>>> reproducing
> >>>>>>>>>>>> tomorrow.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Source and destination processors are custom processors.
> >>>>>>>>>>>> This is Nifi 0.5.0 RC3
> >>>>>>>>>>>>
> >>>>>>>>>>>> NiFi thinks all FlowFiles are still there: <
> >>>>>>> http://imgur.com/isDlRk4>
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm looking at logs now no ERRORs or WARN that seem
> >>>> suspicious so
> >>>>>>> far
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, Feb 16, 2016 at 8:46 PM, Matthew Clarke <
> >>>>>>>>>>> [email protected]>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Lars,
> >>>>>>>>>>>>>      What version of NiFi are you running?
> >>>>>>>>>>>>>      What type of processor was your source processor?
> >>>>>>>>>>>>>      What type of processor was the destination processor?
> >>>>>>>>>>>>>      I tried reproducing using a GenerateFlowFile to
> >>> produce
> >>>>>>> ~100k
> >>>>>>>>>>>>> Flowfiles on a connection to an UpdateAttribute processor.
> >>> I
> >>>>>> then
> >>>>>>>>>>> stopped
> >>>>>>>>>>>>> the GenerateFlowFile processor , added a funnel, and moved
> >>>> the
> >>>>>>>>>>> connection.
> >>>>>>>>>>>>> I also added another processor feeding that same funnel and
> >>>>>> routed
> >>>>>>>>> the
> >>>>>>>>>>>>> connection from the funnel back to the UpdateAttribute
> >>>>>> processor.
> >>>>>>>>> The
> >>>>>>>>>>>>> files moved as expected through the funnnel.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>      Can you reproduce?   Any other errors in your app log
> >>>> from
> >>>>>>>>> prior
> >>>>>>>>>>> to
> >>>>>>>>>>>>> completing the connection?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Matt
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Tue, Feb 16, 2016 at 1:15 PM, Lars Francke <
> >>>>>>>>> [email protected]>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I'm trying to understand what happened and how I can
> >>>> prevent
> >>>>>>> this
> >>>>>>>>> in
> >>>>>>>>>>> the
> >>>>>>>>>>>>>> future.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The outcome seems to be that all my FlowFiles which were
> >>>>>> sitting
> >>>>>>>>> in a
> >>>>>>>>>>>>>> connection have been deleted from disk.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I had a flow with two processors connected via a single
> >>>>>>> connection.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> What I did:
> >>>>>>>>>>>>>> * Stop all Processors
> >>>>>>>>>>>>>> * Add a Funnel
> >>>>>>>>>>>>>> * Add a Processor
> >>>>>>>>>>>>>> * Move destination end of existing connection to funnel
> >>>> (with
> >>>>>>> the
> >>>>>>>>>>>>> existing
> >>>>>>>>>>>>>> FlowFiles)
> >>>>>>>>>>>>>> * Connect new Processor to Funnel
> >>>>>>>>>>>>>> * Connect Funnel to old destination Processor
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The connection between the Funnel and the Destination
> >>>>>> processor
> >>>>>>>>> still
> >>>>>>>>>>>>> shows
> >>>>>>>>>>>>>> all 90k FlowFiles but the Processor fails on session.read
> >>>>>> with a
> >>>>>>>>>>>>>> MissingFlowFileException.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Sure enough my content_repository is mostly empty too.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Now this isn't so bad because it's only a dev environment
> >>>> but
> >>>>>>> I'd
> >>>>>>>>>>> like to
> >>>>>>>>>>>>>> understand how this could happen. Did I do something
> >>> wrong?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Any hints on what to search for in the logs or which
> >>> place
> >>>> in
> >>>>>>> the
> >>>>>>>>>>> source
> >>>>>>>>>>>>>> code to look?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>> Lars
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>
>
>

Reply via email to