Ok cool. From the the 'gathering of the raw materials of knowledge perspective' - If the things you want internal access to are already exposed via the REST API then we're definitely safe on the threading side in so far as the things I was worried about when considering internal access.
If we focus on the examples you mention in both 3 and 4 I believe these are examples which would be fulfilled by interacting through the REST API. For item 3 while there may be some server side backing configuration values and status data the client side will be the thing rendering/displaying colors. For item 4 which is about interacting with the application (start/stop/etc...) these are things which would need to occur through the REST API because that is the cluster-wide coordination mechanism/layer. How it is designed/intended today would be that server side functions (controller services or reporting tasks) could be used to set/establish server side knowledge of behavior and the client would get access to this at runtime and behave as it needs given this knowledge. If the client is a visual UI then it could, for instance, render things differently. If the client is something automated invoking the REST API endpoints then it could do things like 'start' or 'stop' or alter the flow. The ideas you're presenting which make for a more engaging user experience definitely make sense to me and we definitely should do more to make these happen. I'm just pointing out that they sound less like Controller Service or Reporting Task type things and more like data we should expose via the REST API. This would allow clients be they services or browsers to take whatever action they might want to. Thanks Joe On Fri, Apr 22, 2016 at 10:31 PM, Joe Skora <[email protected]> wrote: > Joe W, > > The use case that originally got me thinking about this was a processor > highlighter that looks for processors above or below configured thresholds, > possibly by instance or type. I think that requires the ability to > > 1. enumerate the processor and/or queue collections, > 2. query each processor or queue for stats like those shown in the UI, > 3. highlight the processor some how, by changing color for instance, and > 4. (possibly) affect the processor by stopping / starting it. > > I realize this exposes system internal and threading concerns, but the REST > API already provides the information externally. Any controller service > using this information must be designed to not have a negative impact on > the system, but that's already true of any custom processor or controller > service since they can overload or lock up the framework if they behave > badly. > > Overall, I think the visibility into the data flows, hot/cold spots, larger > than expected ingests, etc. provides value that would far outweigh concerns > about any new risk this capability would create. > > Thanks, > Joe S > > On Fri, Apr 22, 2016 at 2:25 PM, Joe Witt <[email protected]> wrote: > >> Yeah understood. So let's dig into this more. We need to avoid over >> exposure of internal state which one might want to crawl through >> because that introduces some multi-threaded challenges and could limit >> our ability to evolve internals. However, if we understand the >> questions you'd like to be able to ask of certain things better >> perhaps we can better expose those results. >> >> Can you try stating what you're looking for in a bit more specific >> examples. For instance you said "want to iterate over the processor >> collections...to look for performance thresholds" - What sorts of >> performance threshold questions? >> >> On Fri, Apr 22, 2016 at 2:20 PM, Joe Skora <[email protected]> wrote: >> > Joe Witt - Not really, this kind of went sideways from where I was >> > originally headed. >> > >> > I'm looking for a way for a controller service to iterate over the >> > processor and queue collections (maybe others as well) to look for >> > performance thresholds or other issues and then provide feedback somehow >> or >> > report externally. >> > >> > If it can be done through the REST API, seems like it should be possible >> > from within the framework as well. >> > >> > On Fri, Apr 22, 2016 at 1:32 PM, Joe Witt <[email protected]> wrote: >> > >> >> Joe Skora - does Jeremy's JIRA cover your use case needs? >> >> >> >> On Fri, Apr 22, 2016 at 12:44 PM, Jeremy Dyer <[email protected]> wrote: >> >> > Mark, >> >> > >> >> > ok that makes sense. I have created a jira for this improvement >> >> > https://issues.apache.org/jira/browse/NIFI-1805 >> >> > >> >> > On Fri, Apr 22, 2016 at 12:27 PM, Mark Payne <[email protected]> >> >> wrote: >> >> > >> >> >> Jeremy, >> >> >> >> >> >> It should be relatively easy. In FlowController, we would have to >> update >> >> >> getGroupStatus() to set the values on ConnectionStatus >> >> >> and of course update ConnectionStatus to have getters & setters for >> the >> >> >> new values. That should be about it, I think. >> >> >> >> >> >> -Mark >> >> >> >> >> >> >> >> >> > On Apr 22, 2016, at 12:17 PM, Jeremy Dyer <[email protected]> >> wrote: >> >> >> > >> >> >> > Mark, >> >> >> > >> >> >> > What would the process look like for doing that? Would that be >> >> something >> >> >> > trivial or require some reworking? >> >> >> > >> >> >> > On Fri, Apr 22, 2016 at 10:26 AM, Mark Payne <[email protected] >> > >> >> >> wrote: >> >> >> > >> >> >> >> I definitely don't think we should be exposing the FlowController >> to >> >> a >> >> >> >> Reporting Task. >> >> >> >> However, I think exposing information about whether or not >> >> backpressure >> >> >> is >> >> >> >> being applied >> >> >> >> (or even is configured) is a very reasonable idea. >> >> >> >> >> >> >> >> -Mark >> >> >> >> >> >> >> >> >> >> >> >>> On Apr 22, 2016, at 10:22 AM, Jeremy Dyer <[email protected]> >> wrote: >> >> >> >>> >> >> >> >>> I could see the argument for not making that available. What >> about >> >> some >> >> >> >>> sort of reference that would allow the ReportingTask to to >> >> determine if >> >> >> >>> backpressure is being applied to a Connection? It currently seems >> >> you >> >> >> can >> >> >> >>> see the number of bytes and/or objects count queued in a >> connection >> >> but >> >> >> >>> don't have any reference to the values a user has setup for >> >> >> backpressure >> >> >> >> in >> >> >> >>> the UI. Is there a way to get those values in the scope of the >> >> >> >>> ReportingTask? >> >> >> >>> >> >> >> >>> On Fri, Apr 22, 2016 at 10:03 AM, Bryan Bende <[email protected]> >> >> >> wrote: >> >> >> >>> >> >> >> >>>> I think the only way you could do it directly without the REST >> API >> >> is >> >> >> by >> >> >> >>>> having access to the FlowController, >> >> >> >>>> but that is purposely not exposed to extension points... >> actually >> >> >> >>>> StandardFlowController is what implements the >> >> >> >>>> EventAccess interface which ends up providing the path way to >> the >> >> >> status >> >> >> >>>> objects. >> >> >> >>>> >> >> >> >>>> I would have to defer to Joe, Mark, and others about whether we >> >> would >> >> >> >> want >> >> >> >>>> to expose direct access to components >> >> >> >>>> through controller services, or some other extension point. >> >> >> >>>> >> >> >> >>>> On Fri, Apr 22, 2016 at 9:46 AM, Jeremy Dyer <[email protected]> >> >> >> wrote: >> >> >> >>>> >> >> >> >>>>> Bryan, >> >> >> >>>>> >> >> >> >>>>> The ReportingTask enumeration makes sense and was helpful for >> >> >> something >> >> >> >>>>> else I am working on as well. >> >> >> >>>>> >> >> >> >>>>> Like Joe however I'm looking for a way to not just get the >> *Status >> >> >> >>>> objects >> >> >> >>>>> but rather start and stop processors. Is there a way to do that >> >> from >> >> >> >> the >> >> >> >>>>> ReportContext scope? I imagine you could pull the Processor >> "Id" >> >> from >> >> >> >> the >> >> >> >>>>> ProcessorStatus and then use the REST API but was looking for >> >> >> something >> >> >> >>>>> more direct than having to use the REST API >> >> >> >>>>> >> >> >> >>>>> >> >> >> >>>>> On Fri, Apr 22, 2016 at 9:23 AM, Bryan Bende <[email protected] >> > >> >> >> wrote: >> >> >> >>>>> >> >> >> >>>>>> Hi Joe, >> >> >> >>>>>> >> >> >> >>>>>> I'm not sure if a controller service can do this, but a >> >> >> ReportingTask >> >> >> >>>> has >> >> >> >>>>>> access to similar information. >> >> >> >>>>>> >> >> >> >>>>>> A ReportingTask gets access to a ReportingContext, which can >> >> access >> >> >> >>>>>> EventAccess which can access ProcessGroupStatus. >> >> >> >>>>>> >> >> >> >>>>>> From ProcessGroupStatus you are at the root process group and >> can >> >> >> >>>>> enumerate >> >> >> >>>>>> the flow: >> >> >> >>>>>> >> >> >> >>>>>> private Collection<ConnectionStatus> connectionStatus = new >> >> >> >>>>> ArrayList<>(); >> >> >> >>>>>> private Collection<ProcessorStatus> processorStatus = new >> >> >> >>>> ArrayList<>(); >> >> >> >>>>>> private Collection<ProcessGroupStatus> processGroupStatus = >> new >> >> >> >>>>>> ArrayList<>(); >> >> >> >>>>>> private Collection<RemoteProcessGroupStatus> >> >> >> remoteProcessGroupStatus >> >> >> >> = >> >> >> >>>>> new >> >> >> >>>>>> ArrayList<>(); >> >> >> >>>>>> private Collection<PortStatus> inputPortStatus = new >> >> ArrayList<>(); >> >> >> >>>>>> private Collection<PortStatus> outputPortStatus = new >> >> ArrayList<>(); >> >> >> >>>>>> >> >> >> >>>>>> Not sure if that is what you were looking for. >> >> >> >>>>>> >> >> >> >>>>>> -Bryan >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> On Fri, Apr 22, 2016 at 8:25 AM, Joe Skora <[email protected]> >> >> >> wrote: >> >> >> >>>>>> >> >> >> >>>>>>> Is it possible and if so what is the best way for a >> controller >> >> >> >>>> service >> >> >> >>>>> to >> >> >> >>>>>>> get the collection of all processors or queues? >> >> >> >>>>>>> >> >> >> >>>>>>> The goal being to iterate over the collection of processors >> or >> >> >> queues >> >> >> >>>>> to >> >> >> >>>>>>> gather information or make adjustments to the flow. >> >> >> >>>>>>> >> >> >> >>>>>> >> >> >> >>>>> >> >> >> >>>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>
