[
https://issues.apache.org/jira/browse/NIP-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17936829#comment-17936829
]
Mark Payne commented on NIP-6:
------------------------------
[~joewitt] yeah I debated whether or not it made sense to add the
{{PARTIALLY_ENGAGED}} or not. I don't know off the top of my head if we'd ever
really need it, but I felt it was effectively free to determine and it rounds
out our API to provide all of the states that we could be in. The only cost is
in the case of multiple incoming connections, we could potentially {{break}}
out of the loop early if one of the Connections does not have backpressure
applied instead of checking all. But the vast majority of processors,
especially those where this would be valuable, will only ever have one incoming
connection anyway. I could go either way.
> Allow Processors to understand state of incoming connections' backpressure
> --------------------------------------------------------------------------
>
> Key: NIP-6
> URL: https://issues.apache.org/jira/browse/NIP-6
> Project: NiFi Improvement Proposal
> Issue Type: New Feature
> Reporter: Mark Payne
> Priority: Major
>
> *Motivation and description*
> There are cases when we need to know whether or not backpressure is applied
> on incoming connections. For example, in MergeContent, when a Correlation
> Attribute is used, it is very easy to have a situation in which the Min
> Number of Entries or Min Bin Size is not met, but in which backpressure is
> applied upstream. In this case, we have to just wait for the Max Bin Age to
> elapse in order for MergeContent to progress. This introduces significant
> artificial delays into our dataflow. And while Max Bin Age is always
> recommended, it's not required, so the flow could potentially block
> indefinitely it not properly configured.
> Instead, it would make sense for MergeContent to determine "all upstream
> connections have backpressure applied" and as a result immediately merge the
> "most full bin" in order to make progress. This information is not currently
> available, though. We need to surface this information via the
> {{{}ProcessContext{}}}.
>
> *Scope*
> Implementing this will require that the following method be added to
> {{{}ProcessContext{}}}:
> {code:java}
> BackpressureEngagement getBackpressureEngagement(); {code}
> where {{BackpressureEngagement}} is defined as:
> {code:java}
> public enum BackpressureEngagement {
> // No upstream connections have backpressure engaged
> BACKPRESSURE_NOT_ENGAGED,
> // At least one upstream connection has backpressure engaged but not all
> BACKPRESSURE_PARTIALLY_ENGAGED,
> // All upstream connections have backpressure engaged
> BACKPRESSURE_ENGAGED;
> } {code}
> This, naturally, means that the the method must also be implemented in the
> {{StandardProcessContext}} as well as {{ConnectableProcessContext}} and
> {{{}MockProcessContext{}}}.
> In order to allow proper testing of components, we must also update the
> {{TestRunner}} interface and {{StandardProcessorTestRunner}} implementation
> to accommodate this with a setter method:
> {code:java}
> public void setBackpressureEngagement(BackpressureEngagement engagement);
> {code}
> The default value in {{StandardProcessorTestRunner}} should be
> {{{}BackpressureEngagement.BACKPRESSURE_NOT_ENGAGED{}}}.
>
> *Alternatives Considered*
> As outlined in NIFI-14366, in addition to the prescribed values for
> {{BackpressureEngagement}} the following alternatives were considered:
> {code:java}
> - SINGLE
> - IN_A_RELATIONSHIP
> - ITS_COMPLICATED
> - ENGAGED {code}
> Ultimately, I think we should avoid this nomenclature for the following
> reasons:
> * May lead to confusion. I fear we would also need to introduce a value of
> "GHOSTED" which would be confusing because it means something very different
> from when a Processor or Controller Service is Ghosted.
> * Backpressure is tied to a Connection, and Connections are already often
> confused with Processor Relationships. This may well further blur that line
> and lead to more confusion.
> * I'm concerned about the complexity of implementing `ITS_COMPLICATED`. Once
> we start factoring in the emotional state of each connection, well... it gets
> complicated.
> * I'd be concerned about "API sprawl" - for instance, will someone then want
> to introduce a `breakup()` method to set the status to SINGLE? The error
> handling there could be a nightmare.
> It was also suggested that we name the method {{amITheProblem}} with
> potential return values of {{YES_YOU_ARE_SLOW}} or {{{}NO_ITS_ME{}}}.
> My main concern here is that the value can potentially change rather quickly.
> Imagine the logs, indicating why backpressure is engaged:
> {code:java}
> Backpressure engaged? YES_YOU_ARE_SLOW
> Backpressure engaged? NO_ITS_ME
> Backpressure engaged? YES_YOU_ARE_SLOW
> Backpressure engaged? NO_ITS_ME
> Backpressure engaged? YES_YOU_ARE_SLOW
> Backpressure engaged? NO_ITS_ME
> Backpressure engaged? YES_YOU_ARE_SLOW
> Backpressure engaged? NO_ITS_ME{code}
> We don't need people thinking of our api is trapped in a loop of self-doubt
> and finger-pointing.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)