Re: Interactive Queue management

Rick Braddy Fri, 09 Oct 2015 09:16:59 -0700

+1


> On Oct 9, 2015, at 7:16 AM, Matt Gilman <[email protected]> wrote:
> 
> Joe,
> 
> We've been receiving a lot of feedback lately regarding the Purge/Clear
> Queue capability. Because of this we'd like to introduce that feature into
> the 0.4.0 release with the Viewing and Removing of individual FlowFiles
> into a subsequent release. The Purge/Clear Queue capability is a small
> portion of the full Queue Management feature that we can introduce
> independently and quickly.
> 
> I looked through your NiFi Fork specifically at the Purge/Clear
> functionality. There are some additional considerations that I mentioned
> before specifically around FlowFile swapping and submitting the Purge/Clear
> request asynchronously that we need to account for. We have created a
> branch (NIFI-730) for doing this work. We'd love for you to work with us on
> this aspect if you are interested.
> 
> Thanks!
> 
> Matt
> 
>> On Fri, Oct 2, 2015 at 3:16 PM, Matt Gilman <[email protected]> wrote:
>> 
>> Joe,
>> 
>> Yes, as Mark mentioned it is definitely awesome that your interested in
>> digging in here. Most of the discussions regarding this feature have been
>> really high level at this point. So we're happy to work through some of the
>> details as Mark has begun. A couple points that come to mind right now.
>> 
>> - I don't think we want to support manually prioritization. The
>> connections can be configured with prioritizers and we'd like to use those
>> to manage the ordering of the enqueued FlowFiles. However, the listing of
>> FlowFiles will be rendered by their priority by default though will likely
>> support sorting by any of the fields.
>> 
>> - The current thought process is that we'll want to require source and
>> destination components to be stopped. This is inline with the existing
>> functionality throughout the application.
>> 
>> - The number of enqueued FlowFiles is technically unbounded. Because of
>> this the endpoint may be need to support some sort of pagination since we'd
>> may not want/be able to return the entire queue in a single response. There
>> is some concern about Java heap since many of the flowfiles may be swapped
>> out to disk. Additionally there are some concerns about HTTP response size
>> and the amount of data we store client side.
>> 
>> - What we do with the FlowFiles that are swapped out to disk is still
>> undecided. Not sure whether we want to load them from disk in order to
>> include them in the response or if we just show that X number of FlowFiles
>> are currently swapped out.
>> 
>> Some of these items will need to be hashed out but we're happy to work
>> through them with you. We should keep the Feature Proposal up to date as
>> well [1].
>> 
>> Thanks!
>> 
>> Matt
>> 
>> [1]
>> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management
>> 
>>> On Thu, Oct 1, 2015 at 12:34 PM, Mark Payne <[email protected]> wrote:
>>> 
>>> Joe,
>>> 
>>> First of all, it is awesome that you're interested in jumping on this!
>>> And I think you're off to a great
>>> start and have a really good understanding of exactly where we all want
>>> to go with this.
>>> 
>>> I'm sure there will be a lot of questions that will come up in working
>>> through a lot of the
>>> stuff here. Just from reading through the email here i have a couple of
>>> comments/thoughts that
>>> may help to shape the way forward. This is a bit of a stream of
>>> consciousness, so I hope all
>>> makes sense :)
>>> 
>>> The connectionQueueItem model that you lay out here, I think is really
>>> just a FlowFile.
>>> I think it will make sense to just use the name flowFile.
>>> 
>>> When you bring up the contents of a queue in the UI, I would imagine that
>>> it would be shown
>>> as something similar to the Data Provenance table. From there I'd want to
>>> click on the FlowFile
>>> in the table to see more details. So I'm envisioning two separate data
>>> models really. The first
>>> would be maybe a FlowFileSummary. It would look very similar to what
>>> you've laid out below,
>>> but perhaps contain information about how long the FlowFile has been
>>> queued up, perhaps
>>> how many times it has been re-queued on this particular queue (for
>>> example, if a FlowFile keeps
>>> failing to process, we could use this information to remove that
>>> particular FlowFile from the
>>> queue, etc.)
>>> 
>>> When we get more info for the FlowFile, I would expect it to contain all
>>> FlowFile Attributes. This
>>> I think is a different data model because if we pull back all attributes
>>> for every FlowFile when
>>> we render the table, the amount of data brought back could be huge.
>>> 
>>> Another consideration here, is that when a connection has a lot of
>>> FlowFiles on it, the framework
>>> may swap those FlowFiles out to disk in order to remove them from the
>>> Java heap. We will
>>> want to ensure that we include info about how much is in the queue (# of
>>> FlowFiles and size of those
>>> FlowFiles), how much is swapped out (# of FlowFiles + size), and how much
>>> is currently being
>>> processed by Processors (in the FlowFileQueue this is referenced as
>>> Unacknowledged FlowFiles).
>>> 
>>> In the UI table, we should also make sure that by default we are showing
>>> the FlowFiles in the order
>>> in which they exist in the queue right now.
>>> 
>>> From a RESTful perspective, we may want to also consider that in order to
>>> purge a queue, we are not
>>> really deleting the queue itself, but rather its contents. So perhaps we
>>> should use a URI like
>>> 
>>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/contents
>>> <
>>> http://your-host/nifi-api/controller/process-groups/%7Bprocess-group-id%7D/connections/%7Bconnection-id%7D/queue/contents
>>> but I'll be the first to admit that REST is not really my forte. So if
>>> that doesn't make sense then ignore that.
>>> 
>>> Very excited to see you jumping in here!
>>> 
>>> Thanks
>>> -Mark
>>> 
>>> 
>>> 
>>>>> On Oct 1, 2015, at 12:13 PM, József Mészáros <
>>>> [email protected]> wrote:
>>>> 
>>>> Hey NiFi experts :-)
>>>> 
>>>> I have started to work on the backend part of interactive queue
>>> management,
>>>> which has several related issues: NIFI-99 (Review in flight flow file
>>>> details) <https://issues.apache.org/jira/browse/NIFI-99>,NIFI-108
>>>> <https://issues.apache.org/jira/browse/NIFI-108>,NIFI-730 (Purge queue
>>> from
>>>> UI) <https://issues.apache.org/jira/browse/NIFI-730>,NIFI-139
>>> (Distribution
>>>> of FlowFiles on a connection)
>>>> <https://issues.apache.org/jira/browse/NIFI-139>. There is a feature
>>>> proposal description [1], which helps you to get a quick overview.
>>>> 
>>>> I hope I made the first step to move forward with this topic in a
>>>> "standard" and "good" direction. I tried to think generally, and making
>>> the
>>>> changes considering all the requested improvements from backend
>>>> perspective. The basic idea was to extend the web-api with a new
>>> endpoint
>>>> for managing a connection queue (used e.g. by the UI). Based on the
>>>> mentioned issues, I created the following new "methods":
>>>> 
>>>>  - Get the content of the connection queue
>>>> 
>>>> *GET*
>>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue
>>>> 
>>>> Response: List of connection queue items
>>>> 
>>>>  - Clear (purge) the connection queue
>>>> 
>>>> *DELETE*
>>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue
>>>> 
>>>>  - Remove a single item from the connection queue
>>>> 
>>>> *DELETE*
>>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/{flow-file-uuid}
>>>> 
>>>>  - Get a single item from the connection queue
>>>> 
>>>> *GET*
>>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/{flow-file-uuid}
>>>> 
>>>> Response: Single connection queue item
>>>> 
>>>> The connection queue item looks like this (JSON):
>>>> 
>>>> "connectionQueueItem": {
>>>>       "flowFileId": 16,
>>>>       "flowFileUuid": "92c74b41-005e-444d-8f9e-f9cbc01af5f2",
>>>>       "fileSize": "42 bytes",
>>>>       "fileSizeBytes": 42,
>>>>       "fileName": "filename.tsv",
>>>>       "entryDate": 1443546148763,
>>>>       "lineageStartDate": 1443546148763,
>>>>       "contentClaimSection": "1",
>>>>       "contentClaimContainer": "default",
>>>>       "contentClaimIdentifier": "1443546148763-1",
>>>>       "contentClaimOffset": 0
>>>>   }
>>>> 
>>>> If the flow file has a priority attribute, it is also included as a
>>> numeric
>>>> value.
>>>> 
>>>> It contains enough information for "View content" and "Download content"
>>>> panels and from the frontend perspective, it could be implemented in a
>>>> similar way. If you would like to purge the connection queue, or a
>>> single
>>>> item, you just have to make an HTTP DELETE request. And off course your
>>> are
>>>> able to make statistics and review the content of the queue with the
>>> first
>>>> method. Updating an item (e.g. reorder the queue) can result a new
>>> method,
>>>> or covered by a delete + put for a single item with a new priority
>>> value.
>>>> 
>>>> You can find my commits in the following repo :
>>>> https://github.com/ImpressTv/nifi in branch NIFI-108
>>>> <https://github.com/ImpressTV/nifi/tree/NIFI-108>. I do not want to
>>> make a
>>>> pull request until the backend is not in a mergeable state.
>>>> 
>>>> It is important to mention, that the backend code is not complete, and
>>> at
>>>> some points it maybe requires some reshaping, but you can see the basic
>>>> concept, and the direction. Before making any new commits, I wanted to
>>>> share the current state with you, and start/initiate a conversation
>>> about
>>>> the topic, and to get feedback, whether is it a good contribution, or
>>> not.
>>>> So guys, the questions are open :-)
>>>> 
>>>> Regards,
>>>> Joe
>>>> 
>>>> [1]
>>> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management
>>

Re: Interactive Queue management

Reply via email to