Re: Routing to Failure relationships and Route provenance events

Michal Klempa Thu, 17 Nov 2016 01:17:44 -0800

Hi,
thank you both for responses. I understand the scenario with rerouting
back to processor would cause infinite provenance history. It can also
cause inifite loop, when the destination system is offline, therefore
I am not using this approach in this case.


Generaly, I have problem identifying the 'last processor which routed
the flowfile to failure before entering failure handling'. And yes, I
was thinking of attaching UpdateAttribute right after each failure
connection I need to handle and distinguish. This would be really
messy. Therefore I was thinking I am doing something wrong in general.

My thoughts were, that when I can identify where the FlowFile escaped
standard execution through failure, I can then just save flowfile
somewhere (e.g. HDFS) with metadata (attributes) and let this for
future inspection and especially -> manually re-entering the flow from
the point of failure. Is this a bad approach ? Or how do you design
flows then? Is it possible to programmatically inspect flowfile to
find a processor which was the last in the chain touching it (even
though this processor did not emit any provenance event at all)? If
so, tell me, I can afford coding my processor to acoomplish this task.

Thanks. Michal.

On Thu, Nov 10, 2016 at 7:50 PM, Andy LoPresto <[email protected]> wrote:
> Michael,
>
> A temporary solution would be to insert an UpdateAttribute processor between
> the source processor (where the failure occurred) and your general failure
> handling flow. This processor could add an attribute noting the location of
> the failure and you could quickly determine that when debugging.
>
> If this seems cumbersome, you could also put a single ExecuteScript
> processor at the beginning of your failure handling flow and query the
> provenance events for the incoming flowfile, detect the last event that
> occurred, and then write out an additional, arbitrary provenance event
> indicating the failure.
>
> Neither are excellent solutions, and Mark is right that there should be a
> better option for diagnosing this. Please submit a Jira capturing your
> thoughts and we’ll see what is possible.
>
>
> Andy LoPresto
> [email protected]
> [email protected]
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Nov 7, 2016, at 6:10 AM, Mark Payne <[email protected]> wrote:
>
> Michal,
>
> Currently, the guidance that we give is for processors not to emit any sort
> of ROUTE event for
> routing a FlowFile to a 'failure' relationship. While this may seem
> counter-intuitive, we do this because
> most of the time when a FlowFile is routed to 'failure', the failure
> relationship is not pointing to some
> sort of 'failure' flow like you describe here but rather the failure
> relationship is a self-loop so that the
> Processor tries again.
>
> In the scenario described above, if PostHTTP were to route a FlowFile to
> failure and failure looped back
> to PostHTTP, we may see that the FlowFile was routed to failure hundreds (or
> more) of times. As a result,
> the Provenance lineage would not really be very easy to follow because it
> would be filled with a huge number
> of 'ROUTE' events.
>
> That being said, there are things that we could do to be smart about this at
> the framework level. For instance,
> we could notice that the ROUTE event indicates that the FlowFile is being
> routed back to the same queue that
> it came from, so we could just discard the ROUTE event.
>
> Unfortunately, this doesn't always solve the problem, because we also often
> see scenarios where there is perhaps
> a DistributeLoad processor that load balances between 5 different PostHTTP
> processors for instance. If a PostHTTP
> fails, it routes back to the DistributeLoad. So we'd need to keep track of
> the fact that it's been to this connection before,
> even though it wasn't the last connection, and so on.
>
> So that was a really long-winded way to say: We intentionally do not emit
> ROUTE events for 'failure' because it can create
> some very complicated, hard-to-follow lineages. But we can - and should - do
> better.
>
> If this is something that you are interested in digging into, in the
> codebase, the community would be more than happy
> to help guide you along the way!
>
> Also, if you have other feedback about how you think we can handle these
> cases better, please feel free to elaborate on
> the thread.
>
> Thanks
> -Mark
>
>
>
> On Nov 7, 2016, at 5:46 AM, Michal Klempa <[email protected]> wrote:
>
> Hi,
> I am maintaining several dataflows and I am facing this issue in practice:
> Lets say, I have several points of possible failure within the
> dataflow (nearly every processor have failure output), I route all of
> these into my general failure handler subgroup, which basically does
> some filtering and formatting before issuing a notification by email.
>
> From my email notifications, I get the FlowFile UUID and in case i am
> curious on what happened, I go into NiFi and search provenance events
> for this particular FlowFile.
> And here comes the point:
> Sometimes I find hard to find, which processor was the first one which
> sent the file into the 'Failure path'.
>
> Shouldn't processor which does the 'failure' routing send a
> ProvenanceEvent with type
> ProvenanceEventType.Route to the flowfile history for Dataflow manager
> to know when this unfortunate event happened? Is this the guideline
> which Processors do not obey?
>
> Or maybe, I do something wrong when search for events/history of the
> FlowFile.
>
> To get into the concrete example, let me point out that PostHTTP
> processor never issues any provenance event regarding the failure (nor
> it fills any execution details into attributes, like does the
> ExecuteStreamCommand do, for example, there you have execution.error
> which contains the stderr). So locating the error to be in PostHTTP is
> just heuristic from my side and I cannot find any HTTP -verbose output
> (like in curl -v for example), with headers, response from server or
> at least 'connection timeout' if that is the case...
>
> Thanks for suggestions and opinions.
> Michal Klempa
>
>
>

Re: Routing to Failure relationships and Route provenance events

Reply via email to