Team,

I'd like to propose we remove the FlowFilePrioritizer [1] from the set
of first class extension points we support.

The background:

FlowFilePrioritizer implementations are used to compare flow files as
they are enqueued on a given connection in the flow.  This in turn
means when flow files are pulled from the queue they are pulled in a
manner that allows the most important data first to be operated on.
This is a valuable feature and is heavily utilized.  Out of the box
NiFi provides several obvious prioritizer implementations such as
first in and out based on age of the flow file, first in based on
entry order, and honoring a numeric representation of priority set as
a specific attribute [2].  They are rarely changed and have so far not
grown in numbers nor have there been any discussions of doing so.  If
I think back to their usage over the past decade I actually think
there have been only a few ever made.

The concept and ability to sort queues is important and powerful and
needs to be kept.  But making them a first-class extension point I am
now questioning the value of.  The reason being is that as defined the
interface is intuitive for the developer but much harder for the
framework side.  That combined with their lack of ever being extended
opens the debate.

When the prioritizers were first envisioned we didn't support the
concept of swapping out flowfiles to disk when the queues were huge.
We now do.  But we cannot sort (at this time) the swapped out items.
By getting rid of this extension point as it is now we can instead
support these types of prioritizers in a different and more optimized
manner albeit in a less extension friendly way (more coupled to the
framework).  Rather than simply using comparators we can do absolute
priority assignment and when swapping out flow files we can track the
largest/smallest priority and thus enable prioritized swap-in.  This
would also be helpful for doing things like auto-cluster load
balancing or cluster-wide prioritized site-to-site.

So, in short, the interface would go from being a comparator to
instead providing a method which returns an absolute priority.  For
example, it would have a method called 'getPriority' which takes in a
flow file and returns a long.

This approach would also still allow chaining prioritizers as we do today.

We still can support this as something which can be extended for those
who wish to do so just in a less friendly and more framework coupled
manner.  Basically, this would just be more like we support content
repository or provenance repository extension where the developer
needs to both understand the implementation they want but also the
mechanics of getting that into the build and the deeper implications.

Would like to hear if others are supportive of this or if they see any
major problems posed by this.  Given we're working towards the 1.x
release this is a good time to pull this cord.  If we do this we can
document the steps and thinking needed to build/contribute new
prioritizer schemes.

Thanks
Joe

[1] 
https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/flowfile/FlowFilePrioritizer.java;h=684f454f57094a0e1f669333d63be06cd5a8a043;hb=refs/heads/0.x
[2] 
https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=tree;f=nifi-nar-bundles/nifi-standard-bundle/nifi-standard-prioritizers/src/main/java/org/apache/nifi/prioritizer;h=6d5db994f9fd9624bf7f548ebd69548b6917ccd1;hb=refs/heads/0.x

Reply via email to