[ 
https://issues.apache.org/jira/browse/NIFI-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Storck updated NIFI-2699:
------------------------------
    Description: 
When running as a cluster, if a node is unable to respond within the socket 
timeout (eg, hitting a breakpoint while debugging), an 
IllegalClusterStateException will be thrown that causes the UI to show the 
"check config and fix errors" page.  Once the node is communicating with the 
cluster again (i.e., breakpoint in the code is passed), the UI can be reloaded 
and the cluster recovers from the timeout without any user intervention at the 
service level. However, user experience could be improved.  If a user initiates 
a replicated request to a node that is unable to respond within the socket 
timeout duration, the user might think NiFi crashed, when it in fact didn't.

Here is the stack trace that was encountered during testing:
{code}
2016-08-29 11:36:59,041 DEBUG [NiFi Web Server-22] 
o.a.n.w.a.c.IllegalClusterStateExceptionMapper
org.apache.nifi.cluster.manager.exception.IllegalClusterStateException: Node 
localhost:8443 is unable to fulfill this request due to: Unexpected Response 
Code 500
        at 
org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$2.onCompletion(ThreadPoolRequestReplicator.java:471)
 ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
        at 
org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:729)
 ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_92]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[na:1.8.0_92]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[na:1.8.0_92]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
~[na:1.8.0_92]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_92]
Caused by: com.sun.jersey.api.client.ClientHandlerException: 
java.net.SocketTimeoutException: Read timed out
        at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
 ~[jersey-client-1.19.jar:1.19]
        at com.sun.jersey.api.client.Client.handle(Client.java:652) 
~[jersey-client-1.19.jar:1.19]
        at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682) 
~[jersey-client-1.19.jar:1.19]
        at 
com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) 
~[jersey-client-1.19.jar:1.19]
        at 
com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:560) 
~[jersey-client-1.19.jar:1.19]
        at 
org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.replicateRequest(ThreadPoolRequestReplicator.java:537)
 ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
        at 
org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:720)
 ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
        ... 5 common frames omitted
Caused by: java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method) ~[na:1.8.0_92]
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) 
~[na:1.8.0_92]
        at java.net.SocketInputStream.read(SocketInputStream.java:170) 
~[na:1.8.0_92]
        at java.net.SocketInputStream.read(SocketInputStream.java:141) 
~[na:1.8.0_92]
        at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) 
~[na:1.8.0_92]
        at sun.security.ssl.InputRecord.read(InputRecord.java:503) 
~[na:1.8.0_92]
        at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973) 
~[na:1.8.0_92]
        at 
sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930) 
~[na:1.8.0_92]
        at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) 
~[na:1.8.0_92]
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) 
~[na:1.8.0_92]
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) 
~[na:1.8.0_92]
        at java.io.BufferedInputStream.read(BufferedInputStream.java:345) 
~[na:1.8.0_92]
        at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) 
~[na:1.8.0_92]
        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) 
~[na:1.8.0_92]
        at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536)
 ~[na:1.8.0_92]
        at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
 ~[na:1.8.0_92]
        at 
java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) 
~[na:1.8.0_92]
        at 
sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338)
 ~[na:1.8.0_92]
        at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
 ~[jersey-client-1.19.jar:1.19]
        at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
 ~[jersey-client-1.19.jar:1.19]
        ... 11 common frames omitted
{code}

  was:
When running as a cluster, if a node is unable to respond within the socket 
timeout (eg, hitting a breakpoint while debugging), an 
IllegalClusterStateException will be thrown that causes the UI to show the 
"check config and fix errors" page.  Once the node is communicating witht he 
cluster again (i.e., breakpoint in the code is passed), the UI can be reloaded 
and the cluster recovers from the timeout without any user intervention at the 
service level. However, user experience could be improved.  If a user initiates 
a replicated request to a node that is unable to respond within the socket 
timeout duration, the user might think NiFi crashed, when it in fact didn't.

Here is the stack trace that was encountered during testing:
{code}
2016-08-29 11:36:59,041 DEBUG [NiFi Web Server-22] 
o.a.n.w.a.c.IllegalClusterStateExceptionMapper
org.apache.nifi.cluster.manager.exception.IllegalClusterStateException: Node 
localhost:8443 is unable to fulfill this request due to: Unexpected Response 
Code 500
        at 
org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$2.onCompletion(ThreadPoolRequestReplicator.java:471)
 ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
        at 
org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:729)
 ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_92]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[na:1.8.0_92]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[na:1.8.0_92]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
~[na:1.8.0_92]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_92]
Caused by: com.sun.jersey.api.client.ClientHandlerException: 
java.net.SocketTimeoutException: Read timed out
        at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
 ~[jersey-client-1.19.jar:1.19]
        at com.sun.jersey.api.client.Client.handle(Client.java:652) 
~[jersey-client-1.19.jar:1.19]
        at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682) 
~[jersey-client-1.19.jar:1.19]
        at 
com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) 
~[jersey-client-1.19.jar:1.19]
        at 
com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:560) 
~[jersey-client-1.19.jar:1.19]
        at 
org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.replicateRequest(ThreadPoolRequestReplicator.java:537)
 ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
        at 
org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:720)
 ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
        ... 5 common frames omitted
Caused by: java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method) ~[na:1.8.0_92]
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) 
~[na:1.8.0_92]
        at java.net.SocketInputStream.read(SocketInputStream.java:170) 
~[na:1.8.0_92]
        at java.net.SocketInputStream.read(SocketInputStream.java:141) 
~[na:1.8.0_92]
        at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) 
~[na:1.8.0_92]
        at sun.security.ssl.InputRecord.read(InputRecord.java:503) 
~[na:1.8.0_92]
        at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973) 
~[na:1.8.0_92]
        at 
sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930) 
~[na:1.8.0_92]
        at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) 
~[na:1.8.0_92]
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) 
~[na:1.8.0_92]
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) 
~[na:1.8.0_92]
        at java.io.BufferedInputStream.read(BufferedInputStream.java:345) 
~[na:1.8.0_92]
        at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) 
~[na:1.8.0_92]
        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) 
~[na:1.8.0_92]
        at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536)
 ~[na:1.8.0_92]
        at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
 ~[na:1.8.0_92]
        at 
java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) 
~[na:1.8.0_92]
        at 
sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338)
 ~[na:1.8.0_92]
        at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
 ~[jersey-client-1.19.jar:1.19]
        at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
 ~[jersey-client-1.19.jar:1.19]
        ... 11 common frames omitted
{code}


> Improve handling of response timeouts in cluster
> ------------------------------------------------
>
>                 Key: NIFI-2699
>                 URL: https://issues.apache.org/jira/browse/NIFI-2699
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework, Core UI
>            Reporter: Jeff Storck
>            Priority: Minor
>             Fix For: 1.1.0
>
>
> When running as a cluster, if a node is unable to respond within the socket 
> timeout (eg, hitting a breakpoint while debugging), an 
> IllegalClusterStateException will be thrown that causes the UI to show the 
> "check config and fix errors" page.  Once the node is communicating with the 
> cluster again (i.e., breakpoint in the code is passed), the UI can be 
> reloaded and the cluster recovers from the timeout without any user 
> intervention at the service level. However, user experience could be 
> improved.  If a user initiates a replicated request to a node that is unable 
> to respond within the socket timeout duration, the user might think NiFi 
> crashed, when it in fact didn't.
> Here is the stack trace that was encountered during testing:
> {code}
> 2016-08-29 11:36:59,041 DEBUG [NiFi Web Server-22] 
> o.a.n.w.a.c.IllegalClusterStateExceptionMapper
> org.apache.nifi.cluster.manager.exception.IllegalClusterStateException: Node 
> localhost:8443 is unable to fulfill this request due to: Unexpected Response 
> Code 500
>         at 
> org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$2.onCompletion(ThreadPoolRequestReplicator.java:471)
>  ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
>         at 
> org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:729)
>  ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_92]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_92]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_92]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[na:1.8.0_92]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_92]
> Caused by: com.sun.jersey.api.client.ClientHandlerException: 
> java.net.SocketTimeoutException: Read timed out
>         at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
>  ~[jersey-client-1.19.jar:1.19]
>         at com.sun.jersey.api.client.Client.handle(Client.java:652) 
> ~[jersey-client-1.19.jar:1.19]
>         at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682) 
> ~[jersey-client-1.19.jar:1.19]
>         at 
> com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) 
> ~[jersey-client-1.19.jar:1.19]
>         at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:560) 
> ~[jersey-client-1.19.jar:1.19]
>         at 
> org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.replicateRequest(ThreadPoolRequestReplicator.java:537)
>  ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
>         at 
> org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:720)
>  ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
>         ... 5 common frames omitted
> Caused by: java.net.SocketTimeoutException: Read timed out
>         at java.net.SocketInputStream.socketRead0(Native Method) 
> ~[na:1.8.0_92]
>         at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) 
> ~[na:1.8.0_92]
>         at java.net.SocketInputStream.read(SocketInputStream.java:170) 
> ~[na:1.8.0_92]
>         at java.net.SocketInputStream.read(SocketInputStream.java:141) 
> ~[na:1.8.0_92]
>         at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) 
> ~[na:1.8.0_92]
>         at sun.security.ssl.InputRecord.read(InputRecord.java:503) 
> ~[na:1.8.0_92]
>         at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973) 
> ~[na:1.8.0_92]
>         at 
> sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930) 
> ~[na:1.8.0_92]
>         at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) 
> ~[na:1.8.0_92]
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) 
> ~[na:1.8.0_92]
>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) 
> ~[na:1.8.0_92]
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:345) 
> ~[na:1.8.0_92]
>         at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) 
> ~[na:1.8.0_92]
>         at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) 
> ~[na:1.8.0_92]
>         at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536)
>  ~[na:1.8.0_92]
>         at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
>  ~[na:1.8.0_92]
>         at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) 
> ~[na:1.8.0_92]
>         at 
> sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338)
>  ~[na:1.8.0_92]
>         at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
>  ~[jersey-client-1.19.jar:1.19]
>         at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
>  ~[jersey-client-1.19.jar:1.19]
>         ... 11 common frames omitted
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to