[ 
https://issues.apache.org/jira/browse/NIFI-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242033#comment-14242033
 ] 

Joseph Witt commented on NIFI-90:
---------------------------------

Penalization of a FlowFile was to support the scenario where 'at this time 
there is something about that object in the environment that is problematic and 
which is expected that in time it will be ok'.  A great example here is 
something trivial like sending that file to a remote system which also has a 
file with the same filename.  It is assumed the remote file system will move 
the conflicting file soon enough and we'll be fine.  So we penalize this flow 
file and let other flow files work through at the same time.

Penalization/Delay of a processor though was about 'at this time there is 
something in the environment that is problematic and will result in undesirable 
results if we perform processing'.  A good example here is a process which 
relies on some context/state/enrichment data and that information isn't yet 
available.  In that case then the processor would be not scheduled to run for a 
while to allow for that condition to sort itself out.  Another good example 
here is something which connects to an external system and that system isn't 
available at the time.  No sense constantly pounding on the door...wait a 
while...try again in a bit.

So in this ticket we're talking about the former scenario - penalization of a 
flow file.  I don't believe penalization on a given connection gives the level 
of granularity desired for the intent of flow file penalization.  However, I do 
think there is a case to be made for 'delay period' on a connection which could 
be used as a sort of blunt penalization or just as a simple mechanism to 
intentionally slow the processing of data (not particularly sophisticated...but 
still).  But we do need to protect the framework from abusive cases - runaway 
processing of a flowfile.  In that sense I think the notion of allow for both 
explicit and implicit penalization is the way to go.   If we're saying the same 
flow file over and over an over in a tight time window that is odd and should 
likely be protected against.

> Replace explicit penalization with automatic penalization/back-off
> ------------------------------------------------------------------
>
>                 Key: NIFI-90
>                 URL: https://issues.apache.org/jira/browse/NIFI-90
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>            Reporter: Joseph Witt
>            Priority: Minor
>
> Rather than having users configure explicit penalization periods and 
> requiring developers to implement it in their processors we can automate 
> this.  Perhaps keep a LinkedHashMap<Connection ID, Counter> of size 5 or so 
> in the FlowFileRecord construct.  When a FlowFile is routed to a Connection, 
> the counter is incremented.  If the counter exceeds 3 visits to the same 
> connection, the FlowFile will be automatically penalized.  This protects us 
> "5 hops out" so that if we have something like DistributeLoad -> PostHTTP 
> with failure looping back to DistributeLoad, we will still penalize when 
> appropriate.
> In addition, we will remove the configuration option from the UI, setting the 
> penalization period to some default such as 5 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to