Team, I'd like to propose we remove the FlowFilePrioritizer [1] from the set of first class extension points we support.
The background: FlowFilePrioritizer implementations are used to compare flow files as they are enqueued on a given connection in the flow. This in turn means when flow files are pulled from the queue they are pulled in a manner that allows the most important data first to be operated on. This is a valuable feature and is heavily utilized. Out of the box NiFi provides several obvious prioritizer implementations such as first in and out based on age of the flow file, first in based on entry order, and honoring a numeric representation of priority set as a specific attribute [2]. They are rarely changed and have so far not grown in numbers nor have there been any discussions of doing so. If I think back to their usage over the past decade I actually think there have been only a few ever made. The concept and ability to sort queues is important and powerful and needs to be kept. But making them a first-class extension point I am now questioning the value of. The reason being is that as defined the interface is intuitive for the developer but much harder for the framework side. That combined with their lack of ever being extended opens the debate. When the prioritizers were first envisioned we didn't support the concept of swapping out flowfiles to disk when the queues were huge. We now do. But we cannot sort (at this time) the swapped out items. By getting rid of this extension point as it is now we can instead support these types of prioritizers in a different and more optimized manner albeit in a less extension friendly way (more coupled to the framework). Rather than simply using comparators we can do absolute priority assignment and when swapping out flow files we can track the largest/smallest priority and thus enable prioritized swap-in. This would also be helpful for doing things like auto-cluster load balancing or cluster-wide prioritized site-to-site. So, in short, the interface would go from being a comparator to instead providing a method which returns an absolute priority. For example, it would have a method called 'getPriority' which takes in a flow file and returns a long. This approach would also still allow chaining prioritizers as we do today. We still can support this as something which can be extended for those who wish to do so just in a less friendly and more framework coupled manner. Basically, this would just be more like we support content repository or provenance repository extension where the developer needs to both understand the implementation they want but also the mechanics of getting that into the build and the deeper implications. Would like to hear if others are supportive of this or if they see any major problems posed by this. Given we're working towards the 1.x release this is a good time to pull this cord. If we do this we can document the steps and thinking needed to build/contribute new prioritizer schemes. Thanks Joe [1] https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/flowfile/FlowFilePrioritizer.java;h=684f454f57094a0e1f669333d63be06cd5a8a043;hb=refs/heads/0.x [2] https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=tree;f=nifi-nar-bundles/nifi-standard-bundle/nifi-standard-prioritizers/src/main/java/org/apache/nifi/prioritizer;h=6d5db994f9fd9624bf7f548ebd69548b6917ccd1;hb=refs/heads/0.x
