+1. I think the benefits of this move far outweigh the potential but unrealized 
value of extensible prioritizers.

Andy LoPresto
[email protected]
[email protected]
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On May 6, 2016, at 9:49 AM, Brandon DeVries <[email protected]> wrote:
> 
> +1.  This seems like something we should provide options for (as we do),
> but doesn't really need to be made / kept accessible for extension.
> 
> Brandon
> 
> On Fri, May 6, 2016 at 11:45 AM Mark Payne <[email protected]> wrote:
> 
>> I'm definitely a +1. In my experience, the way that most people think
>> about prioritizing data is
>> to either assign an absolute priority to a FlowFile and use the
>> PriorityAttributePrioritizer or to
>> use the FirstInFirstOut Prioritizer. Any number of processors can be used
>> to extract the the
>> 'priority' attribute and prioritize the data that way. I think this makes
>> the extensibility less valuable,
>> since the flow itself can be used to determine a 'priority' attribute
>> based on FlowFile content, attributes,
>> etc.
>> 
>>> On May 6, 2016, at 11:16 AM, Joe Witt <[email protected]> wrote:
>>> 
>>> Team,
>>> 
>>> I'd like to propose we remove the FlowFilePrioritizer [1] from the set
>>> of first class extension points we support.
>>> 
>>> The background:
>>> 
>>> FlowFilePrioritizer implementations are used to compare flow files as
>>> they are enqueued on a given connection in the flow.  This in turn
>>> means when flow files are pulled from the queue they are pulled in a
>>> manner that allows the most important data first to be operated on.
>>> This is a valuable feature and is heavily utilized.  Out of the box
>>> NiFi provides several obvious prioritizer implementations such as
>>> first in and out based on age of the flow file, first in based on
>>> entry order, and honoring a numeric representation of priority set as
>>> a specific attribute [2].  They are rarely changed and have so far not
>>> grown in numbers nor have there been any discussions of doing so.  If
>>> I think back to their usage over the past decade I actually think
>>> there have been only a few ever made.
>>> 
>>> The concept and ability to sort queues is important and powerful and
>>> needs to be kept.  But making them a first-class extension point I am
>>> now questioning the value of.  The reason being is that as defined the
>>> interface is intuitive for the developer but much harder for the
>>> framework side.  That combined with their lack of ever being extended
>>> opens the debate.
>>> 
>>> When the prioritizers were first envisioned we didn't support the
>>> concept of swapping out flowfiles to disk when the queues were huge.
>>> We now do.  But we cannot sort (at this time) the swapped out items.
>>> By getting rid of this extension point as it is now we can instead
>>> support these types of prioritizers in a different and more optimized
>>> manner albeit in a less extension friendly way (more coupled to the
>>> framework).  Rather than simply using comparators we can do absolute
>>> priority assignment and when swapping out flow files we can track the
>>> largest/smallest priority and thus enable prioritized swap-in.  This
>>> would also be helpful for doing things like auto-cluster load
>>> balancing or cluster-wide prioritized site-to-site.
>>> 
>>> So, in short, the interface would go from being a comparator to
>>> instead providing a method which returns an absolute priority.  For
>>> example, it would have a method called 'getPriority' which takes in a
>>> flow file and returns a long.
>>> 
>>> This approach would also still allow chaining prioritizers as we do
>> today.
>>> 
>>> We still can support this as something which can be extended for those
>>> who wish to do so just in a less friendly and more framework coupled
>>> manner.  Basically, this would just be more like we support content
>>> repository or provenance repository extension where the developer
>>> needs to both understand the implementation they want but also the
>>> mechanics of getting that into the build and the deeper implications.
>>> 
>>> Would like to hear if others are supportive of this or if they see any
>>> major problems posed by this.  Given we're working towards the 1.x
>>> release this is a good time to pull this cord.  If we do this we can
>>> document the steps and thinking needed to build/contribute new
>>> prioritizer schemes.
>>> 
>>> Thanks
>>> Joe
>>> 
>>> [1]
>> https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/flowfile/FlowFilePrioritizer.java;h=684f454f57094a0e1f669333d63be06cd5a8a043;hb=refs/heads/0.x
>>> [2]
>> https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=tree;f=nifi-nar-bundles/nifi-standard-bundle/nifi-standard-prioritizers/src/main/java/org/apache/nifi/prioritizer;h=6d5db994f9fd9624bf7f548ebd69548b6917ccd1;hb=refs/heads/0.x
>> 
>> 

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to