I'm definitely a +1. In my experience, the way that most people think about prioritizing data is to either assign an absolute priority to a FlowFile and use the PriorityAttributePrioritizer or to use the FirstInFirstOut Prioritizer. Any number of processors can be used to extract the the 'priority' attribute and prioritize the data that way. I think this makes the extensibility less valuable, since the flow itself can be used to determine a 'priority' attribute based on FlowFile content, attributes, etc.
> On May 6, 2016, at 11:16 AM, Joe Witt <[email protected]> wrote: > > Team, > > I'd like to propose we remove the FlowFilePrioritizer [1] from the set > of first class extension points we support. > > The background: > > FlowFilePrioritizer implementations are used to compare flow files as > they are enqueued on a given connection in the flow. This in turn > means when flow files are pulled from the queue they are pulled in a > manner that allows the most important data first to be operated on. > This is a valuable feature and is heavily utilized. Out of the box > NiFi provides several obvious prioritizer implementations such as > first in and out based on age of the flow file, first in based on > entry order, and honoring a numeric representation of priority set as > a specific attribute [2]. They are rarely changed and have so far not > grown in numbers nor have there been any discussions of doing so. If > I think back to their usage over the past decade I actually think > there have been only a few ever made. > > The concept and ability to sort queues is important and powerful and > needs to be kept. But making them a first-class extension point I am > now questioning the value of. The reason being is that as defined the > interface is intuitive for the developer but much harder for the > framework side. That combined with their lack of ever being extended > opens the debate. > > When the prioritizers were first envisioned we didn't support the > concept of swapping out flowfiles to disk when the queues were huge. > We now do. But we cannot sort (at this time) the swapped out items. > By getting rid of this extension point as it is now we can instead > support these types of prioritizers in a different and more optimized > manner albeit in a less extension friendly way (more coupled to the > framework). Rather than simply using comparators we can do absolute > priority assignment and when swapping out flow files we can track the > largest/smallest priority and thus enable prioritized swap-in. This > would also be helpful for doing things like auto-cluster load > balancing or cluster-wide prioritized site-to-site. > > So, in short, the interface would go from being a comparator to > instead providing a method which returns an absolute priority. For > example, it would have a method called 'getPriority' which takes in a > flow file and returns a long. > > This approach would also still allow chaining prioritizers as we do today. > > We still can support this as something which can be extended for those > who wish to do so just in a less friendly and more framework coupled > manner. Basically, this would just be more like we support content > repository or provenance repository extension where the developer > needs to both understand the implementation they want but also the > mechanics of getting that into the build and the deeper implications. > > Would like to hear if others are supportive of this or if they see any > major problems posed by this. Given we're working towards the 1.x > release this is a good time to pull this cord. If we do this we can > document the steps and thinking needed to build/contribute new > prioritizer schemes. > > Thanks > Joe > > [1] > https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/flowfile/FlowFilePrioritizer.java;h=684f454f57094a0e1f669333d63be06cd5a8a043;hb=refs/heads/0.x > [2] > https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=tree;f=nifi-nar-bundles/nifi-standard-bundle/nifi-standard-prioritizers/src/main/java/org/apache/nifi/prioritizer;h=6d5db994f9fd9624bf7f548ebd69548b6917ccd1;hb=refs/heads/0.x
