Thanks for pointing out the NIFI-950 JIRA! I didn't find that one in my
search.

Processor validation compounds when controller services build their list of
referencing components.  The controller service display shows a list of all
processors that reference the service, and includes the status of each of
those components.  So if Processor P1 references Service S1, then
validation of P1 will cause validation of S1 which will cause validation of
P1 again.

1) Asynchronous validation

It sounds like NIFI-950 is the perfect JIRA for this.  Should we add any
information to it?

2) Validate less

I will go ahead with a PR using NIFI-2996, and we can discuss more.  When
properties become invalid while a processor is running, I hope processors
will throw exceptions and/or show bulletins when this happens.

I will also submit a new JIRA to improve the StandardSSLContextService
customValidate() method.  It cracks open both truststore and keystore
(twice) and creates a sample SSLContext which it just throws away.  Now
imagine doing this hundreds or thousands of times in one validation cycle.

-- Mike



On Tue, Nov 8, 2016 at 9:03 AM, Joe Witt <[email protected]> wrote:

> Agree on those points mark.  Should be two different JIRAs.
>
> 1) Asynchronous validation
>
>   Yeah probably not easy and requires lots of thought about how to
> tied into the lifecycle of things.
>
> 2) Validate less
>
>   It is true that things can 'become' invalid while processors are
> running but this is both unlikely and isn't something the framework
> will do anything about.  We don't stop the processor because it has
> become invalid because by the same logic it became invalid it could
> also become valid.  So, i'd be good with just not doing that anymore.
>
> I think if we did the second item it would help alot with these
> massive flow cases.
>
> Thanks
> Joe
>
> On Tue, Nov 8, 2016 at 8:49 AM, Mark Payne <[email protected]> wrote:
> > All,
> >
> > These are certainly valid concerns. There are a few things to keep in
> mind, though,
> > that may help to explain the current design decisions.
> >
> > Firstly, if a component is disabled, then we do not perform validation
> on the component
> > (or, at least, if we do then it's a bug.) When a component is running,
> we DO still perform
> > validation. This is because a Processor (or any component really) can
> still become invalid
> > while it is running. An example of this is if a property uses a
> FileExistsValidator. If the file
> > is removed while the Processor is running, the Processor becomes
> invalid. The Processor
> > does continue to run (though it may or may not keep failing). However,
> the UI does change
> > its icon to show that it is invalid.
> >
> > Secondly, I believe we have no choice but to validate Service X twice if
> two different Processors
> > depend on the service, because its validity certainly may change between
> the time that it was
> > last validated and now (again, take the File Exists Validator as an
> example).
> >
> > I believe the best solution is to refactor how validation is performed
> and to ensure that any
> > action that enters 'user code' (including validation) is performed
> asynchronously. However, this
> > is a very significant change and if someone decides to take this on, it
> is not going to be quick or
> > easy.
> >
> > Even if validation is performed asynchronously, though, we still have
> the case of performing the
> > validation many times. This is why the Developer Guide explicitly spells
> out the importance of
> > ensuring that Validator logic and customValidate methods are very fast,
> efficient methods. If the
> > validation is going to take longer than a couple of milliseconds then it
> doesn't belong in validation
> > and should probably be moved into the @OnEnabled / @OnScheduled
> lifecycle events.
> >
> > I would like to see this go a step further, though, and support some
> mechanism for performing
> > more complex validation, so that processors such as PutSFTP can validate
> username/password
> > combinations, etc. This would be user-driven and performed by clicking
> some sort of "Test" or "Verify"
> > button in the UI. This would help to separate the notions of "valid"
> configurations from "correct"
> > configurations.
> >
> > Again, though, none of these are small efforts and are going to take
> quite a bit of time, and I don't
> > know that anyone has started working on them yet. So we will need
> someone to
> > volunteer to get the work done first :) But I would love to see some of
> this stuff get tackled as well!
> >
> >
> > Cheers
> > -Mark
> >
> >
> >
> >> On Nov 8, 2016, at 8:07 AM, Joe Skora <[email protected]> wrote:
> >>
> >> +1 for the validation change.
> >>
> >> +1 for not calling into user code for GUI refresh.
> >>
> >> I understand the logic behind validating whenever we return current
> state,
> >> but that can put a great deal of load on a system unrelated to the
> actual
> >> data flow.  For the most part, state changes at discrete points such as
> >> configuration, start, onTrigger, etc.  When loading the GUI it seems
> like
> >> we should return the last known state, possibly with a GUI option to
> >> re-validate the components, to minimize the impact of the user interface
> >> side of the sytems on the actual dataflow components.
> >>
> >> As much as duplicate validation can be eliminated that would help as
> well.
> >> Currently I believe that if Processors A and B validate Service X, the
> >> Service X validation will occur twice, contributing to the "exponential"
> >> growth Mike mentioned in the ticket.
> >>
> >> On Tue, Nov 8, 2016 at 12:43 PM, Matt Gilman <[email protected]>
> >> wrote:
> >>
> >>> I also agreed these changes make sense. In addition, another approach
> we
> >>> could consider that has been discussed in the past [1] is to perform
> >>> component validation asynchronously. This presents its own challenges
> but
> >>> would also be helpful. We should try to avoid calling into user code
> in any
> >>> web thread.
> >>>
> >>> Matt
> >>>
> >>> [1] https://issues.apache.org/jira/browse/NIFI-950
> >>>
> >>> On Mon, Nov 7, 2016 at 6:15 PM, Matt Burgess <[email protected]>
> wrote:
> >>>
> >>>> Agreed. Also we validate processors on a timer-based strategy in
> >>>> FlowController (looks like for snapshotting) and in the web server
> >>>> (via ControllerFacade), those seem to happen 6-7 times on that
> >>>> interval (which is like 15-20 seconds). Also we validate all
> >>>> processors on any change to the canvas (such as moving a processor).
> >>>> Besides Mike's suggestion, perhaps we should look at a purely
> >>>> event-driven strategy for validating processors if possible?
> >>>>
> >>>> Regards,
> >>>> Matt
> >>>>
> >>>> On Mon, Nov 7, 2016 at 6:06 PM, Joe Witt <[email protected]> wrote:
> >>>>> Makes good sense to me.
> >>>>>
> >>>>> On Nov 7, 2016 5:39 PM, "Michael Moser" <[email protected]> wrote:
> >>>>>
> >>>>>> All,
> >>>>>>
> >>>>>> I would like to propose a fundamental change to processor validation
> >>>> based
> >>>>>> on observations in https://issues.apache.org/jira/browse/NIFI-2996.
> I
> >>>>>> would
> >>>>>> like to validate processors only when they are in the STOPPED state.
> >>>>>>
> >>>>>> The properties on a processor in the RUNNING state should always be
> >>>> valid,
> >>>>>> else you should not have been able to start the processor. A
> processor
> >>>> in
> >>>>>> the DISABLED statue doesn't show validation results, so it seems a
> >>>> waste to
> >>>>>> validate its properties.
> >>>>>>
> >>>>>> The reason I'm proposing this change is because the NiFi UI slows
> down
> >>>> as
> >>>>>> you add more processors and controller services to the graph. Beyond
> >>>> common
> >>>>>> sense expectations that this would be true, it appears that
> processor
> >>>>>> validation is a significant part of the 'cost' on the server when
> >>>>>> responding to REST API requests.  Some details from my testing are
> in
> >>>> the
> >>>>>> JIRA ticket.
> >>>>>>
> >>>>>> Thoughts?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> -- Mike
> >>>>>>
> >>>>
> >>>
> >
>

Reply via email to