Re: NiFi processor validation

Mark Payne Tue, 08 Nov 2016 05:57:23 -0800

All,

These are certainly valid concerns. There are a few things to keep in mind, 
though,
that may help to explain the current design decisions.

Firstly, if a component is disabled, then we do not perform validation on the 
component
(or, at least, if we do then it's a bug.) When a component is running, we DO 
still perform
validation. This is because a Processor (or any component really) can still 
become invalid
while it is running. An example of this is if a property uses a 
FileExistsValidator. If the file
is removed while the Processor is running, the Processor becomes invalid. The 
Processor
does continue to run (though it may or may not keep failing). However, the UI 
does change
its icon to show that it is invalid.

Secondly, I believe we have no choice but to validate Service X twice if two 
different Processors
depend on the service, because its validity certainly may change between the 
time that it was
last validated and now (again, take the File Exists Validator as an example).

I believe the best solution is to refactor how validation is performed and to 
ensure that any
action that enters 'user code' (including validation) is performed 
asynchronously. However, this
is a very significant change and if someone decides to take this on, it is not 
going to be quick or
easy.

Even if validation is performed asynchronously, though, we still have the case 
of performing the
validation many times. This is why the Developer Guide explicitly spells out 
the importance of
ensuring that Validator logic and customValidate methods are very fast, 
efficient methods. If the
validation is going to take longer than a couple of milliseconds then it 
doesn't belong in validation
and should probably be moved into the @OnEnabled / @OnScheduled lifecycle 
events.

I would like to see this go a step further, though, and support some mechanism 
for performing
more complex validation, so that processors such as PutSFTP can validate 
username/password
combinations, etc. This would be user-driven and performed by clicking some 
sort of "Test" or "Verify"
button in the UI. This would help to separate the notions of "valid" 
configurations from "correct"
configurations.

Again, though, none of these are small efforts and are going to take quite a 
bit of time, and I don't
know that anyone has started working on them yet. So we will need someone to
volunteer to get the work done first :) But I would love to see some of this 
stuff get tackled as well!

Cheers
-Mark

> On Nov 8, 2016, at 8:07 AM, Joe Skora <[email protected]> wrote:
> 
> +1 for the validation change.
> 
> +1 for not calling into user code for GUI refresh.
> 
> I understand the logic behind validating whenever we return current state,
> but that can put a great deal of load on a system unrelated to the actual
> data flow.  For the most part, state changes at discrete points such as
> configuration, start, onTrigger, etc.  When loading the GUI it seems like
> we should return the last known state, possibly with a GUI option to
> re-validate the components, to minimize the impact of the user interface
> side of the sytems on the actual dataflow components.
> 
> As much as duplicate validation can be eliminated that would help as well.
> Currently I believe that if Processors A and B validate Service X, the
> Service X validation will occur twice, contributing to the "exponential"
> growth Mike mentioned in the ticket.
> 
> On Tue, Nov 8, 2016 at 12:43 PM, Matt Gilman <[email protected]>
> wrote:
> 
>> I also agreed these changes make sense. In addition, another approach we
>> could consider that has been discussed in the past [1] is to perform
>> component validation asynchronously. This presents its own challenges but
>> would also be helpful. We should try to avoid calling into user code in any
>> web thread.
>> 
>> Matt
>> 
>> [1] https://issues.apache.org/jira/browse/NIFI-950
>> 
>> On Mon, Nov 7, 2016 at 6:15 PM, Matt Burgess <[email protected]> wrote:
>> 
>>> Agreed. Also we validate processors on a timer-based strategy in
>>> FlowController (looks like for snapshotting) and in the web server
>>> (via ControllerFacade), those seem to happen 6-7 times on that
>>> interval (which is like 15-20 seconds). Also we validate all
>>> processors on any change to the canvas (such as moving a processor).
>>> Besides Mike's suggestion, perhaps we should look at a purely
>>> event-driven strategy for validating processors if possible?
>>> 
>>> Regards,
>>> Matt
>>> 
>>> On Mon, Nov 7, 2016 at 6:06 PM, Joe Witt <[email protected]> wrote:
>>>> Makes good sense to me.
>>>> 
>>>> On Nov 7, 2016 5:39 PM, "Michael Moser" <[email protected]> wrote:
>>>> 
>>>>> All,
>>>>> 
>>>>> I would like to propose a fundamental change to processor validation
>>> based
>>>>> on observations in https://issues.apache.org/jira/browse/NIFI-2996. I
>>>>> would
>>>>> like to validate processors only when they are in the STOPPED state.
>>>>> 
>>>>> The properties on a processor in the RUNNING state should always be
>>> valid,
>>>>> else you should not have been able to start the processor. A processor
>>> in
>>>>> the DISABLED statue doesn't show validation results, so it seems a
>>> waste to
>>>>> validate its properties.
>>>>> 
>>>>> The reason I'm proposing this change is because the NiFi UI slows down
>>> as
>>>>> you add more processors and controller services to the graph. Beyond
>>> common
>>>>> sense expectations that this would be true, it appears that processor
>>>>> validation is a significant part of the 'cost' on the server when
>>>>> responding to REST API requests.  Some details from my testing are in
>>> the
>>>>> JIRA ticket.
>>>>> 
>>>>> Thoughts?
>>>>> 
>>>>> Thanks,
>>>>> -- Mike
>>>>> 
>>> 
>>

Re: NiFi processor validation

Reply via email to