+1
> On Oct 9, 2015, at 7:16 AM, Matt Gilman <[email protected]> wrote: > > Joe, > > We've been receiving a lot of feedback lately regarding the Purge/Clear > Queue capability. Because of this we'd like to introduce that feature into > the 0.4.0 release with the Viewing and Removing of individual FlowFiles > into a subsequent release. The Purge/Clear Queue capability is a small > portion of the full Queue Management feature that we can introduce > independently and quickly. > > I looked through your NiFi Fork specifically at the Purge/Clear > functionality. There are some additional considerations that I mentioned > before specifically around FlowFile swapping and submitting the Purge/Clear > request asynchronously that we need to account for. We have created a > branch (NIFI-730) for doing this work. We'd love for you to work with us on > this aspect if you are interested. > > Thanks! > > Matt > >> On Fri, Oct 2, 2015 at 3:16 PM, Matt Gilman <[email protected]> wrote: >> >> Joe, >> >> Yes, as Mark mentioned it is definitely awesome that your interested in >> digging in here. Most of the discussions regarding this feature have been >> really high level at this point. So we're happy to work through some of the >> details as Mark has begun. A couple points that come to mind right now. >> >> - I don't think we want to support manually prioritization. The >> connections can be configured with prioritizers and we'd like to use those >> to manage the ordering of the enqueued FlowFiles. However, the listing of >> FlowFiles will be rendered by their priority by default though will likely >> support sorting by any of the fields. >> >> - The current thought process is that we'll want to require source and >> destination components to be stopped. This is inline with the existing >> functionality throughout the application. >> >> - The number of enqueued FlowFiles is technically unbounded. Because of >> this the endpoint may be need to support some sort of pagination since we'd >> may not want/be able to return the entire queue in a single response. There >> is some concern about Java heap since many of the flowfiles may be swapped >> out to disk. Additionally there are some concerns about HTTP response size >> and the amount of data we store client side. >> >> - What we do with the FlowFiles that are swapped out to disk is still >> undecided. Not sure whether we want to load them from disk in order to >> include them in the response or if we just show that X number of FlowFiles >> are currently swapped out. >> >> Some of these items will need to be hashed out but we're happy to work >> through them with you. We should keep the Feature Proposal up to date as >> well [1]. >> >> Thanks! >> >> Matt >> >> [1] >> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management >> >>> On Thu, Oct 1, 2015 at 12:34 PM, Mark Payne <[email protected]> wrote: >>> >>> Joe, >>> >>> First of all, it is awesome that you're interested in jumping on this! >>> And I think you're off to a great >>> start and have a really good understanding of exactly where we all want >>> to go with this. >>> >>> I'm sure there will be a lot of questions that will come up in working >>> through a lot of the >>> stuff here. Just from reading through the email here i have a couple of >>> comments/thoughts that >>> may help to shape the way forward. This is a bit of a stream of >>> consciousness, so I hope all >>> makes sense :) >>> >>> The connectionQueueItem model that you lay out here, I think is really >>> just a FlowFile. >>> I think it will make sense to just use the name flowFile. >>> >>> When you bring up the contents of a queue in the UI, I would imagine that >>> it would be shown >>> as something similar to the Data Provenance table. From there I'd want to >>> click on the FlowFile >>> in the table to see more details. So I'm envisioning two separate data >>> models really. The first >>> would be maybe a FlowFileSummary. It would look very similar to what >>> you've laid out below, >>> but perhaps contain information about how long the FlowFile has been >>> queued up, perhaps >>> how many times it has been re-queued on this particular queue (for >>> example, if a FlowFile keeps >>> failing to process, we could use this information to remove that >>> particular FlowFile from the >>> queue, etc.) >>> >>> When we get more info for the FlowFile, I would expect it to contain all >>> FlowFile Attributes. This >>> I think is a different data model because if we pull back all attributes >>> for every FlowFile when >>> we render the table, the amount of data brought back could be huge. >>> >>> Another consideration here, is that when a connection has a lot of >>> FlowFiles on it, the framework >>> may swap those FlowFiles out to disk in order to remove them from the >>> Java heap. We will >>> want to ensure that we include info about how much is in the queue (# of >>> FlowFiles and size of those >>> FlowFiles), how much is swapped out (# of FlowFiles + size), and how much >>> is currently being >>> processed by Processors (in the FlowFileQueue this is referenced as >>> Unacknowledged FlowFiles). >>> >>> In the UI table, we should also make sure that by default we are showing >>> the FlowFiles in the order >>> in which they exist in the queue right now. >>> >>> From a RESTful perspective, we may want to also consider that in order to >>> purge a queue, we are not >>> really deleting the queue itself, but rather its contents. So perhaps we >>> should use a URI like >>> >>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/contents >>> < >>> http://your-host/nifi-api/controller/process-groups/%7Bprocess-group-id%7D/connections/%7Bconnection-id%7D/queue/contents >>> but I'll be the first to admit that REST is not really my forte. So if >>> that doesn't make sense then ignore that. >>> >>> Very excited to see you jumping in here! >>> >>> Thanks >>> -Mark >>> >>> >>> >>>>> On Oct 1, 2015, at 12:13 PM, József Mészáros < >>>> [email protected]> wrote: >>>> >>>> Hey NiFi experts :-) >>>> >>>> I have started to work on the backend part of interactive queue >>> management, >>>> which has several related issues: NIFI-99 (Review in flight flow file >>>> details) <https://issues.apache.org/jira/browse/NIFI-99>,NIFI-108 >>>> <https://issues.apache.org/jira/browse/NIFI-108>,NIFI-730 (Purge queue >>> from >>>> UI) <https://issues.apache.org/jira/browse/NIFI-730>,NIFI-139 >>> (Distribution >>>> of FlowFiles on a connection) >>>> <https://issues.apache.org/jira/browse/NIFI-139>. There is a feature >>>> proposal description [1], which helps you to get a quick overview. >>>> >>>> I hope I made the first step to move forward with this topic in a >>>> "standard" and "good" direction. I tried to think generally, and making >>> the >>>> changes considering all the requested improvements from backend >>>> perspective. The basic idea was to extend the web-api with a new >>> endpoint >>>> for managing a connection queue (used e.g. by the UI). Based on the >>>> mentioned issues, I created the following new "methods": >>>> >>>> - Get the content of the connection queue >>>> >>>> *GET* >>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue >>>> >>>> Response: List of connection queue items >>>> >>>> - Clear (purge) the connection queue >>>> >>>> *DELETE* >>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue >>>> >>>> - Remove a single item from the connection queue >>>> >>>> *DELETE* >>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/{flow-file-uuid} >>>> >>>> - Get a single item from the connection queue >>>> >>>> *GET* >>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/{flow-file-uuid} >>>> >>>> Response: Single connection queue item >>>> >>>> The connection queue item looks like this (JSON): >>>> >>>> "connectionQueueItem": { >>>> "flowFileId": 16, >>>> "flowFileUuid": "92c74b41-005e-444d-8f9e-f9cbc01af5f2", >>>> "fileSize": "42 bytes", >>>> "fileSizeBytes": 42, >>>> "fileName": "filename.tsv", >>>> "entryDate": 1443546148763, >>>> "lineageStartDate": 1443546148763, >>>> "contentClaimSection": "1", >>>> "contentClaimContainer": "default", >>>> "contentClaimIdentifier": "1443546148763-1", >>>> "contentClaimOffset": 0 >>>> } >>>> >>>> If the flow file has a priority attribute, it is also included as a >>> numeric >>>> value. >>>> >>>> It contains enough information for "View content" and "Download content" >>>> panels and from the frontend perspective, it could be implemented in a >>>> similar way. If you would like to purge the connection queue, or a >>> single >>>> item, you just have to make an HTTP DELETE request. And off course your >>> are >>>> able to make statistics and review the content of the queue with the >>> first >>>> method. Updating an item (e.g. reorder the queue) can result a new >>> method, >>>> or covered by a delete + put for a single item with a new priority >>> value. >>>> >>>> You can find my commits in the following repo : >>>> https://github.com/ImpressTv/nifi in branch NIFI-108 >>>> <https://github.com/ImpressTV/nifi/tree/NIFI-108>. I do not want to >>> make a >>>> pull request until the backend is not in a mergeable state. >>>> >>>> It is important to mention, that the backend code is not complete, and >>> at >>>> some points it maybe requires some reshaping, but you can see the basic >>>> concept, and the direction. Before making any new commits, I wanted to >>>> share the current state with you, and start/initiate a conversation >>> about >>>> the topic, and to get feedback, whether is it a good contribution, or >>> not. >>>> So guys, the questions are open :-) >>>> >>>> Regards, >>>> Joe >>>> >>>> [1] >>> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management >>
