[
https://issues.apache.org/jira/browse/NIFI-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Handermann updated NIFI-9878:
-----------------------------------
Fix Version/s: 1.19.0
Resolution: Fixed
Status: Resolved (was: Patch Available)
> DistributedCacheMap Handshake failure, processor hang indefinitely.
> -------------------------------------------------------------------
>
> Key: NIFI-9878
> URL: https://issues.apache.org/jira/browse/NIFI-9878
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 1.15.3, 1.17.0, 1.16.3
> Reporter: Aaron Rich
> Assignee: Jon Shoemaker
> Priority: Major
> Labels: Handshake, distributed_cache
> Fix For: 1.19.0
>
> Attachments:
> 0001-NIFI-9878-fix-for-hanging-client-thread-with-handsha.patch,
> image-2022-04-05-21-54-31-002.png, image-2022-04-05-21-55-16-221.png
>
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> When a DistributedCacheMapClient attempts to connect to a
> DistributedCacheMapServer, but the handshake response is never received by
> the client, the PutDistributedCacheMap processor with hang indefinitely. The
> handshake never times out.
> A situation like this can be caused if a proxy allows for the TCP connection
> to be established between client and server but fails to deliver handshake
> data to/from DistributedCacheMapServer (for example an unstable Istio service
> mesh between the two). Could also happen if a client was accidentally
> misconfigured to point to wrong TCP server point (one that wasn't hosting a
> DistributedCacheMapServer.
> Steps to recreate:
> 1) Set up a PutDistributedCacheMap processor with a
> DistributedMapCacheClientService
> 2) Configure DistributedMapCacheClientService to point to a non
> DistributedCacheMapServer tcp server (nc -lk 127.0.0.1 4457). This simulates
> a situation where the socket connection can be made but there is no handshake
> response from the server (for example, server is in bad state and unable to
> respond, a proxy is misbehaving, etc).
> 3) use generateFlowFile to trigger PutDistributedCacheMap processor.
> 4) processor will hang with no failure or success. Processor will have to be
> force terminated.
> !image-2022-04-05-21-54-31-002.png!
> !image-2022-04-05-21-55-16-221.png!
> Hang occurs at :
> CacheClientRequestHandler.java:92: handshakeHandler.waitHandshakeComplete();
>
> Currently, the "connection timeout" parameter is only used to timeout the
> establishment of the TCP socket connection, not the full application layer
> connection.
> Suggestion:
> Handshake should have a timeout too to be robust to handle a network outage
> where the TCP connection is able to be created, but the handshake data can't
> be exchanged. The processor hanging prevents any way to handle this error in
> a dataflow.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)