[ https://issues.apache.org/jira/browse/NIFI-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17610740#comment-17610740 ]
Joe Witt commented on NIFI-9878: -------------------------------- remove fix version based on latest review feedback > DistributedCacheMap Handshake failure, processor hang indefinitely. > ------------------------------------------------------------------- > > Key: NIFI-9878 > URL: https://issues.apache.org/jira/browse/NIFI-9878 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework > Affects Versions: 1.15.3, 1.17.0, 1.16.3 > Reporter: Aaron Rich > Assignee: Jon Shoemaker > Priority: Major > Labels: Handshake, distributed_cache > Attachments: > 0001-NIFI-9878-fix-for-hanging-client-thread-with-handsha.patch, > image-2022-04-05-21-54-31-002.png, image-2022-04-05-21-55-16-221.png > > Time Spent: 40m > Remaining Estimate: 0h > > When a DistributedCacheMapClient attempts to connect to a > DistributedCacheMapServer, but the handshake response is never received by > the client, the PutDistributedCacheMap processor with hang indefinitely. The > handshake never times out. > A situation like this can be caused if a proxy allows for the TCP connection > to be established between client and server but fails to deliver handshake > data to/from DistributedCacheMapServer (for example an unstable Istio service > mesh between the two). Could also happen if a client was accidentally > misconfigured to point to wrong TCP server point (one that wasn't hosting a > DistributedCacheMapServer. > Steps to recreate: > 1) Set up a PutDistributedCacheMap processor with a > DistributedMapCacheClientService > 2) Configure DistributedMapCacheClientService to point to a non > DistributedCacheMapServer tcp server (nc -lk 127.0.0.1 4457). This simulates > a situation where the socket connection can be made but there is no handshake > response from the server (for example, server is in bad state and unable to > respond, a proxy is misbehaving, etc). > 3) use generateFlowFile to trigger PutDistributedCacheMap processor. > 4) processor will hang with no failure or success. Processor will have to be > force terminated. > !image-2022-04-05-21-54-31-002.png! > !image-2022-04-05-21-55-16-221.png! > Hang occurs at : > CacheClientRequestHandler.java:92: handshakeHandler.waitHandshakeComplete(); > > Currently, the "connection timeout" parameter is only used to timeout the > establishment of the TCP socket connection, not the full application layer > connection. > Suggestion: > Handshake should have a timeout too to be robust to handle a network outage > where the TCP connection is able to be created, but the handshake data can't > be exchanged. The processor hanging prevents any way to handle this error in > a dataflow. > -- This message was sent by Atlassian Jira (v8.20.10#820010)