[ 
https://issues.apache.org/jira/browse/NIFI-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290529#comment-16290529
 ] 

Koji Kawamura commented on NIFI-3377:
-------------------------------------

I was able to debug the root cause of this issue.

RemoteGroupPort (S2S client) persists remote NiFi node endpoints (hostname, 
port, isSecure) into local file (conf/state/port-id.peers by default). This 
file is designed for the old NiFi cluster management system prior to NiFi 1.0.0 
that uses NCM (NiFi Cluster Manager) node. So that even if the NCM node goes 
down, S2S client NiFi instances can be restarted and restore remote node 
endpoints from the persisted file.

Currently, the peers file is read when a RemoteGroupPort is started. Since the 
file is named by a RemoteGroupPort GUID, it does not take S2S transport 
protocol into account. This causes the reported issue, if a RemoteGroupPort is 
configured to use RAW, then it persists RAW endpoints (e.g. remote1:8081, 
remote2:8081), and after its transmission is stopped and reconfigured to use 
HTTP, then it restores RAW endpoints when it's restarted. Actually it sends 
HTTP requests to the RAW port, and vise versa. That's why we see strange 
network layer error.

Once this happens, RemoteGroupPorts will not update remote endpoints either 
consumed all calculated request endpoints, or passes 60 seconds. That's why it 
doesn't immediately recover from the situation.

I was able to fix the issue by adding transport protocol to the persistence 
file name. I will submit a PR shortly.

> NiFi RPG errors when switching between site-to-site transport protocols
> -----------------------------------------------------------------------
>
>                 Key: NIFI-3377
>                 URL: https://issues.apache.org/jira/browse/NIFI-3377
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.1.0
>            Reporter: Matthew Clarke
>            Assignee: Koji Kawamura
>            Priority: Minor
>
> If i have a RPG configured to use the RAW transport protocol and then switch 
> it to use HTTTP transport protocol, it will throw the following error message 
> twice before finally correcting itself:
> 2017-01-19 22:10:32,363 ERROR [I/O dispatcher 841] 
> o.a.n.r.util.SiteToSiteRestApiClient Failed to create transaction for 
> http://<hostname>.openstacklocal:8055/nifi-api/data-transfer/input-ports/b76c293d-0159-1000-0000-00003f85f297/transactions
> org.apache.http.ConnectionClosedException: Connection closed unexpectedly
>       at 
> org.apache.http.nio.protocol.HttpAsyncRequestExecutor.closed(HttpAsyncRequestExecutor.java:140)
>  [httpcore-nio-4.4.5.jar:4.4.5]
>       at 
> org.apache.http.impl.nio.client.InternalIODispatch.onClosed(InternalIODispatch.java:71)
>  [httpasyncclient-4.1.2.jar:4.1.2]
>       at 
> org.apache.http.impl.nio.client.InternalIODispatch.onClosed(InternalIODispatch.java:39)
>  [httpasyncclient-4.1.2.jar:4.1.2]
>       at 
> org.apache.http.impl.nio.reactor.AbstractIODispatch.disconnected(AbstractIODispatch.java:100)
>  [httpcore-nio-4.4.5.jar:4.4.5]
>       at 
> org.apache.http.impl.nio.reactor.BaseIOReactor.sessionClosed(BaseIOReactor.java:279)
>  [httpcore-nio-4.4.5.jar:4.4.5]
>       at 
> org.apache.http.impl.nio.reactor.AbstractIOReactor.processClosedSessions(AbstractIOReactor.java:440)
>  [httpcore-nio-4.4.5.jar:4.4.5]
>       at 
> org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:283)
>  [httpcore-nio-4.4.5.jar:4.4.5]
>       at 
> org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
>  [httpcore-nio-4.4.5.jar:4.4.5]
>       at 
> org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
>  [httpcore-nio-4.4.5.jar:4.4.5]
>       at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
> Similarly the following ERROR message will be thrown many times when 
> switching from HTTP to the RAW transport protocol:
> 2017-01-19 22:13:15,916 ERROR [Timer-Driven Process Thread-10] 
> o.a.nifi.remote.StandardRemoteGroupPort 
> RemoteGroupPort[name=outport2,targets=http://<hostname>:9090/nifi/] failed to 
> communicate with http://<hostname>:9090/nifi/ due to 
> org.apache.nifi.remote.exception.HandshakeException: 
> org.apache.nifi.remote.exception.ProtocolException: Expected to receive 
> ResponseCode, but the stream did not have a ResponseCode
> With both these scenarios, the RPG will eventually correct itself and start 
> working again.  User may be hesitant to wait once they start seeing these 
> ERRORS and instead stop the RPG since the self correction does not occur 
> rapidly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to