Bryan Bende created NIFI-4391: --------------------------------- Summary: PutTCP not properly closing connections Key: NIFI-4391 URL: https://issues.apache.org/jira/browse/NIFI-4391 Project: Apache NiFi Issue Type: Bug Affects Versions: 1.3.0, 1.2.0, 1.1.0, 1.0.0 Reporter: Bryan Bende Assignee: Bryan Bende
Thread from the mailing list... We are using NiFi PutTCP processors to send messages to a number of Moxa onCell ip gateway devices. These Moxa devices are running on a cellular network with not always the most ideal connection. The Moxa only allows for a maximum of 2 simultaneous client connections. What we notice is that although we specify connection / read timeouts on both PutTCP and the Moxa, that sometimes a connection get "stuck". (In the moxa network monitoring we see 2 client sockets coming from PutTCP in the ESTABLISHED state that never go away). This doesn't always happen, but often enough for it to be considered a problem, as it requires a restart of the moxa ports to clear the connections (manual step). It typically happens when PutTCP experiences a Timeout. On the PutTCP processors we have the following settings : - Idle Connection Expiration : 30 seconds (we've set this higher due to bad gprs connection) - Timeout : 10 seconds (this is only used as a timeout for establishing the connection) On the Moxas we have - TCP alive check time : 2min (this should force the Moxa to close the socket) Yet for some reason the connection remains established. Here's what I found out : On the moxa I noticed a connection (with client port 48440) that is in ESTABLISHED mode for 4+ hours. (blocking other connections). On the Moxa I can see when the connection was established : 2017/09/17 14:20:29 [OpMode] Port01 Connect 10.192.2.90:48440 I can track that down in Nifi via the logs (unfortunately PutTCP doesn't log client ports, but from the timestamp I'm sure it's this connection : {code} 2017-09-17 14:20:10,837 DEBUG [Timer-Driven Process Thread-10] o.apache.nifi.processors.standard.PutTCP PutTCP[id=80231a39-1008-1159-a6fa-1f9e3751d608] No available connections, creating a new one... 2017-09-17 14:20:20,860 ERROR [Timer-Driven Process Thread-10] o.apache.nifi.processors.standard.PutTCP PutTCP[id=80231a39-1008-1159-a6fa-1f9e3751d608] No available connections, and unable to create a new one, transferring StandardFlowFileRecord[uuid=79f2a166-5211-4d2d-9275-03f0ce4d5b29,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1505641210025-1, container=default, section=1], offset=84519, length=9],offset=0,name=23934743676390659,size=9] to failure: java.net.SocketTimeoutException: Timed out connecting to 10.32.133.40:4001 2017-09-17 14:20:20,860 ERROR [Timer-Driven Process Thread-10] o.apache.nifi.processors.standard.PutTCP java.net.SocketTimeoutException: Timed out connecting to 10.32.133.40:4001 at org.apache.nifi.processor.util.put.sender.SocketChannelSender.open(SocketChannelSender.java:66) ~[nifi-processor-utils-1.1.0.jar:1.1.0] at org.apache.nifi.processor.util.put.AbstractPutEventProcessor.createSender(AbstractPutEventProcessor.java:312) ~[nifi-processor-utils-1.1.0.jar:1.1.0] at org.apache.nifi.processors.standard.PutTCP.createSender(PutTCP.java:121) [nifi-standard-processors-1.1.0.jar:1.1.0] at org.apache.nifi.processor.util.put.AbstractPutEventProcessor.acquireSender(AbstractPutEventProcessor.java:334) ~[nifi-processor-utils-1.1.0.jar:1.1.0] at org.apache.nifi.processors.standard.PutTCP.onTrigger(PutTCP.java:176) [nifi-standard-processors-1.1.0.jar:1.1.0] at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1099) [nifi-framework-core-1.1.0.jar:1.1.0] at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136) [nifi-framework-core-1.1.0.jar:1.1.0] at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) [nifi-framework-core-1.1.0.jar:1.1.0] at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132) [nifi-framework-core-1.1.0.jar:1.1.0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_111] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_111] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_111] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_111] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_111] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_111] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111] {code} On an OS level I indeed see the TCP connection originated from Nifi : netstat -tn | grep 48440 tcp 711 0 10.192.2.90:48440 10.32.133.40:4001 ESTABLISHED lsof -i TCP:48440 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 3424 root 1864u IPv4 404675057 0t0 TCP NifiServer:48440->10.32.133.40:newoak (ESTABLISHED) ps -ef | grep 3424 root 3424 3390 8 11:32 ? 00:44:33 java -classpath /opt/nifi-1.1.0/./conf..... Any ideas on how to debug this further ? I looked at the code in SocketChannelSender.java:66 and was wondering it no additional cleanup is necessary when throwing the SocketTimeoutException. If the connection ends up getting established after the timeout, doesn't PutTCP need to clean that up ? -- This message was sent by Atlassian JIRA (v6.4.14#64029)