[ 
https://issues.apache.org/jira/browse/IMPALA-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884252#comment-17884252
 ] 

Wenzhe Zhou commented on IMPALA-13399:
--------------------------------------

After increasing value of flag 
variable"statestore_heartbeat_tcp_timeout_seconds" from default value 3 to 6 
(or 10), the issue still happens. But the log messages were changed as 
{code:java}
I0924 03:03:44.437633 19167 statestore.cc:984] Registering: 
impa...@impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:27002
I0924 03:03:44.437691 19167 statestore.cc:568] Subscriber 
'impa...@impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:27002' 
with type COORDINATOR AND EXECUTOR registered (registration id: 
a74408ad8ded5679:a3ed4a7589fba582)
I0924 03:03:44.459619 19321 statestore.cc:320] Subscriber protocol version: V2
I0924 03:03:44.459733 19321 statestore.cc:984] Registering: 
impa...@impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:27000
I0924 03:03:44.459785 19321 statestore.cc:568] Subscriber 
'impa...@impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:27000' 
with type COORDINATOR AND EXECUTOR registered (registration id: 
13439739dab30251:79f422c4f2a9bfbf)
I0924 03:03:44.606819 19492 statestore.cc:320] Subscriber protocol version: V2
I0924 03:03:44.606947 19492 statestore.cc:984] Registering: 
impa...@impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:27001
I0924 03:03:44.606992 19492 statestore.cc:568] Subscriber 
'impa...@impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:27001' 
with type COORDINATOR AND EXECUTOR registered (registration id: 
044309a5f1ada273:2f9f17ae652e1094)
I0924 03:03:49.261722 19132 statestore.cc:320] Subscriber protocol version: V2
I0924 03:03:49.261860 19132 statestore.cc:984] Registering: 
catalog-ser...@impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:26000
I0924 03:03:49.261905 19132 statestore.cc:1007] Active catalogd role is 
designated to 
catalog-ser...@impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:26000
I0924 03:03:49.261917 19132 statestore.cc:568] Subscriber 
'catalog-ser...@impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:26000'
 with type CATALOGD registered (registration id: 
ac4d6b270d142d5e:a3e50ee32808d087)
I0924 03:03:49.381103 18204 statestore.cc:1479] 0/4 subscribers successfully 
heartbeat in the last 6000ms.
I0924 03:03:49.438024 19167 statestore.cc:320] Subscriber protocol version: V2
I0924 03:03:49.438146 19167 statestore.cc:984] Registering: 
impa...@impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:27002
I0924 03:03:49.438190 19167 statestore.cc:568] Subscriber 
'impa...@impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:27002' 
with type COORDINATOR AND EXECUTOR registered (registration id: 
d84c49f5d50e48cc:ab412dda373b158d)
I0924 03:03:49.449852 18200 client-cache.h:372] RPC recv timed out: dest 
address: impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:23002, 
rpc: N6impala18THeartbeatResponseE
I0924 03:03:49.449868 18200 client-cache.cc:174] Broken Connection, destroy 
client for impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:23002
I0924 03:03:49.460111 19321 statestore.cc:320] Subscriber protocol version: V2
I0924 03:03:49.460225 19321 statestore.cc:984] Registering: 
impa...@impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:27000
I0924 03:03:49.460265 19321 statestore.cc:568] Subscriber 
'impa...@impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:27000' 
with type COORDINATOR AND EXECUTOR registered (registration id: 
82467d0c3b01c930:190fac7145880ebf)
I0924 03:03:49.463809 18201 client-cache.h:372] RPC recv timed out: dest 
address: impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:23000, 
rpc: N6impala18THeartbeatResponseE
I0924 03:03:49.463821 18201 client-cache.cc:174] Broken Connection, destroy 
client for impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:23000
I0924 03:03:49.607353 19492 statestore.cc:320] Subscriber protocol version: V2
I0924 03:03:49.607489 19492 statestore.cc:984] Registering: 
impa...@impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:27001
I0924 03:03:49.607539 19492 statestore.cc:568] Subscriber 
'impa...@impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:27001' 
with type COORDINATOR AND EXECUTOR registered (registration id: 
0849d9441460588a:fcee006f7ac187af)
I0924 03:03:49.649829 18193 client-cache.h:372] RPC recv timed out: dest 
address: impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:23001, 
rpc: N6impala18THeartbeatResponseE
I0924 03:03:49.649849 18193 client-cache.cc:174] Broken Connection, destroy 
client for impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:23001
I0924 03:03:54.262281 19132 statestore.cc:320] Subscriber protocol version: V2
I0924 03:03:54.262408 19132 statestore.cc:984] Registering: 
catalog-ser...@impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:26000
I0924 03:03:54.262451 19132 statestore.cc:1007] Active catalogd role is 
designated to 
catalog-ser...@impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:26000
I0924 03:03:54.262463 19132 statestore.cc:568] Subscriber 
'catalog-ser...@impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:26000'
 with type CATALOGD registered (registration id: 
3c4fd8055a40fb80:0ad1656a6d11f4b9)
I0924 03:03:54.304822 18194 client-cache.h:372] RPC recv timed out: dest 
address: impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:23020, 
rpc: N6impala18THeartbeatResponseE
I0924 03:03:54.304840 18194 client-cache.cc:174] Broken Connection, destroy 
client for impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:23020
I0924 03:03:54.437829 18199 client-cache.h:372] RPC recv timed out: dest 
address: impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:23002, 
rpc: N6impala18THeartbeatResponseE
I0924 03:03:54.437848 18199 client-cache.cc:174] Broken Connection, destroy 
client for impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:23002
I0924 03:03:54.438506 19167 statestore.cc:320] Subscriber protocol version: V2
I0924 03:03:54.438620 19167 statestore.cc:984] Registering: 
impa...@impala-ec2-centos79-m6i-4xlarge-xldisk-1c7c.vpc.cloudera.com:27002
{code}
assert_log_contains() in the patch of IMPALA-13388 fails

> Thrift RPCs for Statestore heartbeat timed out randomly in UBSAN build 
> -----------------------------------------------------------------------
>
>                 Key: IMPALA-13399
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13399
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Wenzhe Zhou
>            Priority: Major
>              Labels: flaky-test
>
> This issue was saw in IMPALA-13388.
> In UBSAN build, Thrift RPCs for Statestore heartbeat were timed out in 
> Statestored HA unit-tests randomly, and heartbeats were not sent to 
> subscribers. See log messages in statestored log file:
> {code:java}
> I0917 00:06:54.902865 3728987 statestore.cc:1414] Unable to send heartbeat 
> message to subscriber 
> catalog-ser...@impala-ec2-rhel88-m7g-4xlarge-ondemand-1aed.vpc.cloudera.com:26000,
>  received error: RPC recv timed out: dest address: 
> impala-ec2-rhel88-m7g-4xlarge-ondemand-1aed.vpc.cloudera.com:23020, rpc: 
> N6impala18THeartbeatResponseE
> Subscriber 
> catalog-ser...@impala-ec2-rhel88-m7g-4xlarge-ondemand-1aed.vpc.cloudera.com:26000
>  timed-out during heartbeat RPC. Timeout is 3s.
> I0917 00:06:54.902873 3728987 failure-detector.cc:91] 4 consecutive 
> heartbeats failed for 
> 'catalog-ser...@impala-ec2-rhel88-m7g-4xlarge-ondemand-1aed.vpc.cloudera.com:26000'.
>  State is OK
> I0917 00:06:55.382777 3728993 client-cache.h:372] RPC recv timed out: dest 
> address: impala-ec2-rhel88-m7g-4xlarge-ondemand-1aed.vpc.cloudera.com:23001, 
> rpc: N6impala18THeartbeatResponseE
> I0917 00:06:55.382797 3728993 client-cache.cc:174] Broken Connection, destroy 
> client for impala-ec2-rhel88-m7g-4xlarge-ondemand-1aed.vpc.cloudera.com:23001
> I0917 00:06:55.382866 3728993 statestore.cc:1414] Unable to send heartbeat 
> message to subscriber 
> impa...@impala-ec2-rhel88-m7g-4xlarge-ondemand-1aed.vpc.cloudera.com:27001, 
> received error: RPC recv timed out: dest address: 
> impala-ec2-rhel88-m7g-4xlarge-ondemand-1aed.vpc.cloudera.com:23001, rpc: 
> N6impala18THeartbeatResponseE
> Subscriber 
> impa...@impala-ec2-rhel88-m7g-4xlarge-ondemand-1aed.vpc.cloudera.com:27001 
> timed-out during heartbeat RPC. Timeout is 3s.
> I0917 00:06:55.382876 3728993 failure-detector.cc:91] 4 consecutive 
> heartbeats failed for 
> 'impa...@impala-ec2-rhel88-m7g-4xlarge-ondemand-1aed.vpc.cloudera.com:27001'. 
> State is OK
> I0917 00:06:56.032806 3728989 client-cache.h:372] RPC recv timed out: dest 
> address: impala-ec2-rhel88-m7g-4xlarge-ondemand-1aed.vpc.cloudera.com:23000, 
> rpc: N6impala18THeartbeatResponseE
> I0917 00:06:56.032806 3728990 client-cache.h:372] RPC recv timed out: dest 
> address: impala-ec2-rhel88-m7g-4xlarge-ondemand-1aed.vpc.cloudera.com:23002, 
> rpc: N6impala18THeartbeatResponseE
> I0917 00:06:56.032831 3728989 client-cache.cc:174] Broken Connection, destroy 
> client for impala-ec2-rhel88-m7g-4xlarge-ondemand-1aed.vpc.cloudera.com:23000
> {code}
> Subscriber (impalad or catalogd) log files showed that subscribers did not 
> receive heartbeats  after successful registrations so that subscribers tried 
> to re-register in the statestored. It's one directional Thrift RPC issue from 
> statestored to subscribers. It seems something wrong in Thrift RPC layer. 
> Tried to increase value of flag variable 
> "statestore_heartbeat_tcp_timeout_seconds" from 3 seconds to 6 seconds, the 
> issue still happened.
> This issue did not happen in other type of build, like regular build and ASAN 
> build.  Statestored HA unit-tests are not ran for TSAN build.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to