+1. I think the benefits of this move far outweigh the potential but unrealized value of extensible prioritizers.
Andy LoPresto [email protected] [email protected] PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On May 6, 2016, at 9:49 AM, Brandon DeVries <[email protected]> wrote: > > +1. This seems like something we should provide options for (as we do), > but doesn't really need to be made / kept accessible for extension. > > Brandon > > On Fri, May 6, 2016 at 11:45 AM Mark Payne <[email protected]> wrote: > >> I'm definitely a +1. In my experience, the way that most people think >> about prioritizing data is >> to either assign an absolute priority to a FlowFile and use the >> PriorityAttributePrioritizer or to >> use the FirstInFirstOut Prioritizer. Any number of processors can be used >> to extract the the >> 'priority' attribute and prioritize the data that way. I think this makes >> the extensibility less valuable, >> since the flow itself can be used to determine a 'priority' attribute >> based on FlowFile content, attributes, >> etc. >> >>> On May 6, 2016, at 11:16 AM, Joe Witt <[email protected]> wrote: >>> >>> Team, >>> >>> I'd like to propose we remove the FlowFilePrioritizer [1] from the set >>> of first class extension points we support. >>> >>> The background: >>> >>> FlowFilePrioritizer implementations are used to compare flow files as >>> they are enqueued on a given connection in the flow. This in turn >>> means when flow files are pulled from the queue they are pulled in a >>> manner that allows the most important data first to be operated on. >>> This is a valuable feature and is heavily utilized. Out of the box >>> NiFi provides several obvious prioritizer implementations such as >>> first in and out based on age of the flow file, first in based on >>> entry order, and honoring a numeric representation of priority set as >>> a specific attribute [2]. They are rarely changed and have so far not >>> grown in numbers nor have there been any discussions of doing so. If >>> I think back to their usage over the past decade I actually think >>> there have been only a few ever made. >>> >>> The concept and ability to sort queues is important and powerful and >>> needs to be kept. But making them a first-class extension point I am >>> now questioning the value of. The reason being is that as defined the >>> interface is intuitive for the developer but much harder for the >>> framework side. That combined with their lack of ever being extended >>> opens the debate. >>> >>> When the prioritizers were first envisioned we didn't support the >>> concept of swapping out flowfiles to disk when the queues were huge. >>> We now do. But we cannot sort (at this time) the swapped out items. >>> By getting rid of this extension point as it is now we can instead >>> support these types of prioritizers in a different and more optimized >>> manner albeit in a less extension friendly way (more coupled to the >>> framework). Rather than simply using comparators we can do absolute >>> priority assignment and when swapping out flow files we can track the >>> largest/smallest priority and thus enable prioritized swap-in. This >>> would also be helpful for doing things like auto-cluster load >>> balancing or cluster-wide prioritized site-to-site. >>> >>> So, in short, the interface would go from being a comparator to >>> instead providing a method which returns an absolute priority. For >>> example, it would have a method called 'getPriority' which takes in a >>> flow file and returns a long. >>> >>> This approach would also still allow chaining prioritizers as we do >> today. >>> >>> We still can support this as something which can be extended for those >>> who wish to do so just in a less friendly and more framework coupled >>> manner. Basically, this would just be more like we support content >>> repository or provenance repository extension where the developer >>> needs to both understand the implementation they want but also the >>> mechanics of getting that into the build and the deeper implications. >>> >>> Would like to hear if others are supportive of this or if they see any >>> major problems posed by this. Given we're working towards the 1.x >>> release this is a good time to pull this cord. If we do this we can >>> document the steps and thinking needed to build/contribute new >>> prioritizer schemes. >>> >>> Thanks >>> Joe >>> >>> [1] >> https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/flowfile/FlowFilePrioritizer.java;h=684f454f57094a0e1f669333d63be06cd5a8a043;hb=refs/heads/0.x >>> [2] >> https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=tree;f=nifi-nar-bundles/nifi-standard-bundle/nifi-standard-prioritizers/src/main/java/org/apache/nifi/prioritizer;h=6d5db994f9fd9624bf7f548ebd69548b6917ccd1;hb=refs/heads/0.x >> >>
signature.asc
Description: Message signed with OpenPGP using GPGMail
