Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

James Srinivasan Tue, 30 Oct 2018 13:59:57 -0700

Apologies if I've missed this in the discussion so far - we use the
InvokeHTTP processor a lot, and the invokehttp.java.exception.message
attribute is really handy diving into why things have failed without
having to match up logs with flow files (from a system with hundreds
of processors making thousands of requests). We also route on
invokehttp.status.code (e.g. to retry 403s due to a race hazard in an
external system) but I don't imagine we'd route on
invokehttp.java.exception.* since (as others have mentioned) it looks
pretty fragile.


-- 
James
On Tue, 30 Oct 2018 at 16:44, Peter Wicks (pwicks) <[email protected]> wrote:
>
> Sorry for the delayed response, I've been traveling.
>
> Responses in order:
>
> Matt,
> Right now our work around is to keep retrying errors, usually with a penalty 
> or control rate processor. The problem is that we don't know why it failed, 
> and thus don't know if retry is the correct option. I have not found a way, 
> without code change, to be able to determine if retrying is the correct 
> option or not.
>
> Koji,
> Detailed error handling would indeed be a good workaround to the problems 
> raised by myself and Matt. I have not see this on other processors, but that 
> does not mean we can't do it of course.  I agree that having some kind of 
> hierarchy system for errors would be a much better solution.
>
> Pierre,
> My primary use case is as you described, a user friendly way to see what 
> actually happened without going through the log files. But I while I know 
> it's fragile, routing on exception text stored in an attribute still feels 
> like a very legitimate use case. I know in many systems there are good 
> exception types that can be used to route FlowFile's to appropriate failure 
> relationships, but as Matt mentioned, JDBC has just a handful of exception 
> types for a very large number of possible error types.
>
> I think this is probably the same rational that was used to justify this 
> feature for Execute Stream Command's inclusion of this feature in the past. 
> To many possible failure conditions to handle with just a few failure 
> conditions.
>
> Uwe,
> That is a fair question, but it doesn't feel like such a bad fit to me. It's 
> like extra metadata on the lineage, "We followed this path through the flow 
> because we had exception " .... " which caused the FlowFile to follow the 
> failure route".
>
> But I still prefer the attribute, it could be another option for Detailed 
> error handling; instead of, or in addition to, additional relationships for 
> failures, the exception text could be included in an attribute.
>
> Thanks,
>   Peter
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: Saturday, October 27, 2018 10:46 AM
> To: [email protected]
> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that caused 
> failure in an attribute
>
> Do you really want to mix provenance and data lineage with logging/error 
> information?
>
> Writing exception information/logging information within an attribute is not 
> a bad idea in my opinion.
> If a user wants to use this for routing, why not ... or whatever the user 
> wants to do.
>
> I could imagine that this can be switched on and off by a property via 
> config. E.g. in development on and on production off.
>
> Regards,
> Uwe
>
> > Am 26.10.2018 um 09:26 schrieb Pierre Villard <[email protected]>:
> >
> > Adding another option to the list.
> >
> > Peter - if I understand correctly and based on my own experience, the
> > idea is not to have an 'exception' attribute to perform custom routing
> > after the failure relationship but rather have a more user friendly
> > way to see what happened without going through all the logs for a given 
> > flow file.
> >
> > If that's correct, then could we add this information somehow to the
> > provenance event generated by the processor? Ideally adding a new
> > field to a provenance event or using the existing 'details' field?
> >
> > Pierre
> >
> >
> > Le ven. 26 oct. 2018 à 08:40, Koji Kawamura <[email protected]> a
> > écrit :
> >
> >> Hi all,
> >>
> >> I'd like to add another option to Matt's list of solutions:
> >>
> >> 4) Add a processor property, 'Enable detailed error handling'
> >> (defaults to false), then toggle available list of relationships.
> >> This way, existing flows such as Peter's don't have to change, while
> >> he can opt-in new relationships. RouteOnAttribute can be a reference
> >> implementation.
> >>
> >> I like the idea of thinking relationships as potential exceptions. It
> >> can be better if relationships have hierarchy.
> >> Some users need more granular relationships while others don't.
> >> For NiFi 2.0 or later, supporting relationship hierarchy at framework
> >> can mitigate having additional property at each processor.
> >>
> >> Thanks,
> >> Koji
> >> On Fri, Oct 26, 2018 at 11:49 AM Matt Burgess <[email protected]>
> >> wrote:
> >>>
> >>> Peter,
> >>>
> >>> Totally agree, RDBMS/JDBC is in a weird class as always, there is a
> >>> teaspoon of exception types for an ocean of causes. For NiFi 1.x, it
> >>> seems like we need to pick from a set of less-than-ideal solutions:
> >>>
> >>> 1) Add new relationships, but then your (possibly hundreds of)
> >>> processors are invalid
> >>> 2) Add new auto-terminated relationships, but then your
> >>> previously-handled errors are "lost"
> >>> 3) Add an attribute, but then each NiFi instance/release/flow is
> >>> responsible for parsing the error and handling it as desired.
> >>>
> >>> We could mitigate 1-2 with a tool that updates your flow/template by
> >>> sending all new failure relationships to the same target as the
> >>> existing one, but then the tool itself suffers from maintainability
> >>> issues (as does option #3). If we could recognize that the new
> >>> relationships are self-terminated and then send the errors out to
> >>> the original failure relationship, that could be quite confusing to
> >>> the user, especially as time goes on (how to suppress the "new"
> >>> errors, e.g.).
> >>>
> >>> IMHO I think we're between a rock and a hard place here, I guess
> >>> with great entropy comes great responsibility :P
> >>>
> >>> P.S. For your use case, is the workaround to just keep retrying? Or
> >>> are there other constraints at play?
> >>>
> >>> Regards,
> >>> Matt
> >>>
> >>> On Thu, Oct 25, 2018 at 10:27 PM Peter Wicks (pwicks)
> >>> <[email protected]>
> >> wrote:
> >>>>
> >>>> Matt,
> >>>>
> >>>> If I were to split an existing failure relationship into several
> >> relationships, I do not think I would want to auto-terminate in most cases.
> >> Specifically, I'm interested in a failure relationship for a database
> >> disconnect during SQL execution (database was online when the
> >> connection was verified in the DBCP pool, but went down during
> >> execution). If I were to find a way to separate this into its own
> >> relationship, I do not think most users would appreciate it being a
> >> condition silently not handled by the normal failure path.
> >>>>
> >>>> Thanks,
> >>>>  Peter
> >>>>
> >>>> -----Original Message-----
> >>>> From: Matt Burgess [mailto:[email protected]]
> >>>> Sent: Friday, October 26, 2018 10:18 AM
> >>>> To: [email protected]
> >>>> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that
> >> caused failure in an attribute
> >>>>
> >>>> NiFi (as of the last couple releases I think) has the ability to
> >>>> set
> >> auto-terminating relationships; this IMO is one of those use cases
> >> (for NiFi 1.x). If new relationships are added, they could default to
> >> auto-terminate; then the existing processors should remain valid.
> >>>> However we might want an "omnibus Jira" to capture those
> >>>> relationships
> >> we'd like to remove the auto-termination from in NiFi 2.0.
> >>>>
> >>>> Regards,
> >>>> Matt
> >>>> On Thu, Oct 25, 2018 at 10:12 PM Peter Wicks (pwicks) <
> >> [email protected]> wrote:
> >>>>>
> >>>>> Mark,
> >>>>>
> >>>>> I agree with you that this is the best option in general terms.
> >> After thinking about it some more I think the biggest use case is for
> >> troubleshooting. If a file routes to failure, you need to be watching
> >> the UI to see what the exception was. An admin may have access to the
> >> NiFi log files and could grep the error out, but a normal user who
> >> checks in on the flow and sees a FlowFile in the error queue will not
> >> know what the cause was; this is especially frustrating if retrying
> >> the file works without failure the second time... Capturing the error
> >> message in an attribute makes this easy to find.
> >>>>>
> >>>>> One thing I worry about too is adding new relationships to core
> >> processors. After an upgrade, won't users need to go to each instance
> >> of that processor and handle the new relationship? Right now I'd
> >> swagger we have at least five thousand ExecuteSQL processors in our
> >> environment; and while we have strong scripting skills in my NiFi
> >> team, I would not want to encounter this without that.
> >>>>>
> >>>>> Thanks,
> >>>>>  Peter
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Mark Payne [mailto:[email protected]]
> >>>>> Sent: Thursday, October 25, 2018 10:38 PM
> >>>>> To: [email protected]
> >>>>> Subject: [EXT] Re: New Standard Pattern - Put Exception that
> >>>>> caused failure in an attribute
> >>>>>
> >>>>> I agree - the notion of adding a "failure.reason" attribute is, in
> >> my opinion, an anti-pattern that should be avoided. Relationships are
> >> not a workaround but rather the preferred approach in this scenario -
> >> an attribute I would consider a workaround. This is due to the fact
> >> that not only is it brittle and complex to add processors that route
> >> on such things, but there's no reason at all to assume that from
> >> release to release (even bug fix/increment releases) that the
> >> Exception type or message will be the same, so the flow could stop working 
> >> at any time after upgrading nifi.
> >>>>> Relationships offer a well-defined way to explicitly indicate
> >>>>> "these
> >> are the possible outcomes,"
> >>>>> similar IMO to Java Exception classes vs. throwing Strings in C.
> >>>>>
> >>>>>
> >>>>>> On Oct 25, 2018, at 9:47 AM, Bryan Bende <[email protected]> wrote:
> >>>>>>
> >>>>>> I think processors should really have well defined relationships
> >> for
> >>>>>> the error scenarios that need to be handled. Having the exception
> >>>>>> message is ok for a human who wants to see it, but in order to do
> >>>>>> anything with it in the flow you will have to have a bunch of
> >>>>>> parsing/interpreting of the message with a bunch of routing
> >>>>>> processors, which seems more brittle than just having the
> >>>>>> appropriate relationships.
> >>>>>> On Thu, Oct 25, 2018 at 1:36 AM Peter Wicks (pwicks) <
> >> [email protected]> wrote:
> >>>>>>>
> >>>>>>> When a FlowFile is routed to failure, frequently there is no
> >> clear reason without looking into the actual error message.
> >>>>>>> Some processors work around this by creating many different
> >> relationships, but even then frequently the generic Failure
> >> relationship also provides little guidance.
> >>>>>>>
> >>>>>>> I've seen a few cases recently where processors are including
> >>>>>>> the
> >> exception message as an attribute on the FlowFile when routing to
> >> failure (ExecuteStreamCommand, new PR for ExecuteSQL). Should this be
> >> a standard pattern so that it's easier for users to route failures?
> >>>>>>>
> >>>>>>> --Peter
> >>>>>
> >>
>

Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

Reply via email to