[ 
https://issues.apache.org/jira/browse/NIFI-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449222#comment-15449222
 ] 

Bryan Bende commented on NIFI-2615:
-----------------------------------

This sounds like a useful processor, thanks for writing this up!

Not sure if this is helpful, but you may be able to reuse some of the work that 
was done on all the network listening processors. I know this one needs to 
connect first, but I think the pattern could potentially be the same...
The idea is the processor starts a background thread that is doing the network 
communication and reading data into a queue that is shared by the processor, 
when the processor executes it polls the queue and puts the data into flow 
files.
Part of the design is to allow batching together of multiple messages into a 
single flow file to get much higher through put.

There are some abstract classes that helps with all the plumbing, they are used 
by ListenTCP, ListenSyslog, etc....
https://github.com/apache/nifi/tree/master/nifi-commons/nifi-processor-utilities/src/main/java/org/apache/nifi/processor/util/listen

I totally realize this processor may be a bit different and may need to be its 
own thing, but just wanted to throw this info out there just in case.

> Add support for GetTCP processor
> --------------------------------
>
>                 Key: NIFI-2615
>                 URL: https://issues.apache.org/jira/browse/NIFI-2615
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Core Framework
>    Affects Versions: 1.0.0, 0.7.0, 0.6.1
>            Reporter: Andrew Psaltis
>            Assignee: Andrew Psaltis
>
> This processor will allow NiFi to connect to a host via TCP, thus acting as 
> the client and consume data. This should provide the following properties:
> * Endpoint -  this should accept a list of addresses in the format of 
> <Address>:<Port> - if a user wants to be able to track the ingestion rate per 
> address then you would want to have one address in this list. However, there 
> are times when multiple endpoints represent a logical entity and the 
> aggregate ingestion rate is representative of it. 
> * Failover Endpoint - An endpoint to fall over to if the list of Endpoints is 
> exhausted and a connection cannot be made to them or it is disconnected and 
> cannot reconnect.
> * Receive Buffer Size -- The size of the TCP receive buffer to use. This does 
> not related to the size of content in the resulting flow file.
> * Keep Alive -- This enables TCP keep Alive
> * Connection Timeout -- How long to wait when trying to establish a connection
> This processor should also support the following:
> 1. If a connection to end endpoint is broken, it should be logged and 
> reconnections to it should be made. Potentially an exponential backoff 
> strategy will be used. The strategy if there is more than one should be 
> documented and potentially exposed as an Attribute.
> 2. When there are multiple instances of this processor in a flow and NiFi is 
> setup in a cluster, this processor needs to ensure that received messages are 
> not dual processed. For example if this processor is configured to point to 
> the endpoint (172.31.32.212:10000) and the data flow is running on more than 
> one node then only one node should be processing data. In essence they should 
> form a group and have similar semantics as a Kafka consumer group does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to