[ 
https://issues.apache.org/jira/browse/CURATOR-723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amr Elazhary updated CURATOR-723:
---------------------------------
    Description: 
We have a jar application

we are using the below curator-framework-4.3.0 and zookeeper-3.5.7 on the 
client servers for such application
!https://issues.apache.org/jira/s/xd97tr/820010/13pdxe5/_/images/fav-jsw.png! 
CURATOR-525 There is a race condition in Curator which might lead to fake 
SUSPENDED event and ruin CuratorFrameworkImpl inner state - ASF JIRA (State Bug)
!https://issues.apache.org/jira/s/xd97tr/820010/13pdxe5/_/images/fav-jsw.png! 
CURATOR-526 Error logged for valid config - "Invalid config event received: 
\{properties}" - ASF JIRA (logging Bug)
Above two Bugs Fixed in 5.0.0/5.2.0

 

We have the below logs on our application (probably a combination of these two 
bugs) which ends up on consuming a lot of CPU and a failure after (due to 
open/close connections)
{noformat}
2024-10-18T06:35:27.855+11:00 - o.a.c.f.state.ConnectionStateManager - State 
change: SUSPENDED
2024-10-18T06:35:27.855+11:00 - au.com.unico.herd.impl.HerdImpl - New 
connection state : SUSPENDED
2024-10-18T06:35:27.855+11:00 - au.com.unico.herd.impl.HerdImpl - Connection to 
server has been suspended..
2024-10-18T06:35:28.638+11:00 - org.apache.curator.ConnectionState - Negotiated 
session timeout: 100002024-10-18T06:35:28.639+11:00 - 
o.a.c.f.state.ConnectionStateManager - State change: RECONNECTED
2024-10-18T06:35:28.639+11:00 - o.a.c.f.imps.CuratorFrameworkImpl - Clearing 
sleep for 0 operations
2024-10-18T06:35:28.639+11:00 - au.com.unico.herd.impl.HerdImpl - New 
connection state : RECONNECTED
2024-10-18T06:35:28.639+11:00 - au.com.unico.herd.impl.HerdImpl - Connection to 
server has been unsuspended...2024-10-18T06:35:28.640+11:00 - 
o.a.c.framework.imps.EnsembleTracker - New config event received: 
{server.2=xhhuj10717:2888:3888:participant, 
server.1=xhhuj10716:2888:3888:participant, 
server.5=xhwuj10517:2888:3888:participant, 
server.4=xhwuj10516:2888:3888:participant, 
server.3=xhhuj10721:2888:3888:participant, 
version=0}2024-10-18T06:35:28.640+11:00 - au.com.unico.herd.impl.HerdImpl - 
Reconnect worker starting
2024-10-18T06:35:28.640+11:00 - au.com.unico.herd.impl.HerdImpl - Attempting to 
reconnect to the Herd
2024-10-18T06:35:28.640+11:00 - au.com.unico.herd.impl.HerdImpl - Waiting for 
connection to server
2024-10-18T06:35:28.640+11:00 - au.com.unico.herd.impl.HerdImpl - Connected to 
server
2024-10-18T06:35:28.640+11:00 - au.com.unico.herd.impl.HerdImpl - Reconnect 
worker completed2024-10-18T06:35:28.647+11:00 - 
o.a.c.framework.imps.EnsembleTracker - Invalid config event received: 
{server.2=xhhuj10717:2888:3888:participant, 
server.1=xhhuj10716:2888:3888:participant, 
server.5=xhwuj10517:2888:3888:participant, 
server.4=xhwuj10516:2888:3888:participant, 
server.3=xhhuj10721:2888:3888:participant, version=0} {noformat}
 

+*Our Questions*+

1.For the logging Bug "Invalid config event received"  , we need to upgrade the 
Curator as per https://issues.apache.org/jira/browse/CURATOR-526 please confirm

2. For the reconnections , we are suspecting that it is related to BUG 
https://issues.apache.org/jira/browse/CURATOR-525 , however we need to confirm 
this (how?), also what is the trigger for this BUG knowing that we have another 
jar applications having the same Curator/Zookeeper version and they are not 
facing this BUG if our assumptions is correct ? please advice the trigger here 
and why it is happening on some applications and not happening for others. 

3. Can we upgrade the curator to version 5.2.0 without upgrading the Zookeper 
knowing that we use Zookeeper is zookeeper-3.5.7.

  was:
We have a jar application

we are using the below curator-framework-4.3.0 and zookeeper-3.5.7 on the 
client servers for such application
!https://issues.apache.org/jira/s/xd97tr/820010/13pdxe5/_/images/fav-jsw.png! 
[CURATOR-525] There is a race condition in Curator which might lead to fake 
SUSPENDED event and ruin CuratorFrameworkImpl inner state - ASF JIRA (State Bug)
!https://issues.apache.org/jira/s/xd97tr/820010/13pdxe5/_/images/fav-jsw.png! 
[CURATOR-526] Error logged for valid config - "Invalid config event received: 
\{properties}" - ASF JIRA (logging Bug)
Above two Bugs Fixed in 5.0.0/5.2.0

 

We have the below on our applciation (probably a combination of these two bugs)
{noformat}
2024-10-18T06:35:27.855+11:00 - o.a.c.f.state.ConnectionStateManager - State 
change: SUSPENDED
2024-10-18T06:35:27.855+11:00 - au.com.unico.herd.impl.HerdImpl - New 
connection state : SUSPENDED
2024-10-18T06:35:27.855+11:00 - au.com.unico.herd.impl.HerdImpl - Connection to 
server has been suspended..
2024-10-18T06:35:28.638+11:00 - org.apache.curator.ConnectionState - Negotiated 
session timeout: 100002024-10-18T06:35:28.639+11:00 - 
o.a.c.f.state.ConnectionStateManager - State change: RECONNECTED
2024-10-18T06:35:28.639+11:00 - o.a.c.f.imps.CuratorFrameworkImpl - Clearing 
sleep for 0 operations
2024-10-18T06:35:28.639+11:00 - au.com.unico.herd.impl.HerdImpl - New 
connection state : RECONNECTED
2024-10-18T06:35:28.639+11:00 - au.com.unico.herd.impl.HerdImpl - Connection to 
server has been unsuspended...2024-10-18T06:35:28.640+11:00 - 
o.a.c.framework.imps.EnsembleTracker - New config event received: 
{server.2=xhhuj10717:2888:3888:participant, 
server.1=xhhuj10716:2888:3888:participant, 
server.5=xhwuj10517:2888:3888:participant, 
server.4=xhwuj10516:2888:3888:participant, 
server.3=xhhuj10721:2888:3888:participant, 
version=0}2024-10-18T06:35:28.640+11:00 - au.com.unico.herd.impl.HerdImpl - 
Reconnect worker starting
2024-10-18T06:35:28.640+11:00 - au.com.unico.herd.impl.HerdImpl - Attempting to 
reconnect to the Herd
2024-10-18T06:35:28.640+11:00 - au.com.unico.herd.impl.HerdImpl - Waiting for 
connection to server
2024-10-18T06:35:28.640+11:00 - au.com.unico.herd.impl.HerdImpl - Connected to 
server
2024-10-18T06:35:28.640+11:00 - au.com.unico.herd.impl.HerdImpl - Reconnect 
worker completed2024-10-18T06:35:28.647+11:00 - 
o.a.c.framework.imps.EnsembleTracker - Invalid config event received: 
{server.2=xhhuj10717:2888:3888:participant, 
server.1=xhhuj10716:2888:3888:participant, 
server.5=xhwuj10517:2888:3888:participant, 
server.4=xhwuj10516:2888:3888:participant, 
server.3=xhhuj10721:2888:3888:participant, version=0} {noformat}
 

Our Questions:

 

1.For the logging Bug "Invalid config event received"  , we need to upgrade the 
Curator as per https://issues.apache.org/jira/browse/CURATOR-526 please confirm

2. For the reconnections , we are suspecting that it is related to BUG 
https://issues.apache.org/jira/browse/CURATOR-525 , however we need to confirm 
this (how?), also what is the trigger for this BUG knowing that we have another 
jar applications having the same Curator/Zookeeper version and they are not 
facing this BUG if our assumptions is correct ? please advice the trigger here 
and why it is happening on some applications and not happening for others. 

3. Can we upgrade the curator to version 5.2.0 without upgrading the Zookeper 
knowing that we use Zookeeper is zookeeper-3.5.7.


> [
> -
>
>                 Key: CURATOR-723
>                 URL: https://issues.apache.org/jira/browse/CURATOR-723
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework
>    Affects Versions: 4.3.0
>            Reporter: Amr Elazhary
>            Priority: Major
>         Attachments: Status Change Logs.csv
>
>
> We have a jar application
> we are using the below curator-framework-4.3.0 and zookeeper-3.5.7 on the 
> client servers for such application
> !https://issues.apache.org/jira/s/xd97tr/820010/13pdxe5/_/images/fav-jsw.png! 
> CURATOR-525 There is a race condition in Curator which might lead to fake 
> SUSPENDED event and ruin CuratorFrameworkImpl inner state - ASF JIRA (State 
> Bug)
> !https://issues.apache.org/jira/s/xd97tr/820010/13pdxe5/_/images/fav-jsw.png! 
> CURATOR-526 Error logged for valid config - "Invalid config event received: 
> \{properties}" - ASF JIRA (logging Bug)
> Above two Bugs Fixed in 5.0.0/5.2.0
>  
> We have the below logs on our application (probably a combination of these 
> two bugs) which ends up on consuming a lot of CPU and a failure after (due to 
> open/close connections)
> {noformat}
> 2024-10-18T06:35:27.855+11:00 - o.a.c.f.state.ConnectionStateManager - State 
> change: SUSPENDED
> 2024-10-18T06:35:27.855+11:00 - au.com.unico.herd.impl.HerdImpl - New 
> connection state : SUSPENDED
> 2024-10-18T06:35:27.855+11:00 - au.com.unico.herd.impl.HerdImpl - Connection 
> to server has been suspended..
> 2024-10-18T06:35:28.638+11:00 - org.apache.curator.ConnectionState - 
> Negotiated session timeout: 100002024-10-18T06:35:28.639+11:00 - 
> o.a.c.f.state.ConnectionStateManager - State change: RECONNECTED
> 2024-10-18T06:35:28.639+11:00 - o.a.c.f.imps.CuratorFrameworkImpl - Clearing 
> sleep for 0 operations
> 2024-10-18T06:35:28.639+11:00 - au.com.unico.herd.impl.HerdImpl - New 
> connection state : RECONNECTED
> 2024-10-18T06:35:28.639+11:00 - au.com.unico.herd.impl.HerdImpl - Connection 
> to server has been unsuspended...2024-10-18T06:35:28.640+11:00 - 
> o.a.c.framework.imps.EnsembleTracker - New config event received: 
> {server.2=xhhuj10717:2888:3888:participant, 
> server.1=xhhuj10716:2888:3888:participant, 
> server.5=xhwuj10517:2888:3888:participant, 
> server.4=xhwuj10516:2888:3888:participant, 
> server.3=xhhuj10721:2888:3888:participant, 
> version=0}2024-10-18T06:35:28.640+11:00 - au.com.unico.herd.impl.HerdImpl - 
> Reconnect worker starting
> 2024-10-18T06:35:28.640+11:00 - au.com.unico.herd.impl.HerdImpl - Attempting 
> to reconnect to the Herd
> 2024-10-18T06:35:28.640+11:00 - au.com.unico.herd.impl.HerdImpl - Waiting for 
> connection to server
> 2024-10-18T06:35:28.640+11:00 - au.com.unico.herd.impl.HerdImpl - Connected 
> to server
> 2024-10-18T06:35:28.640+11:00 - au.com.unico.herd.impl.HerdImpl - Reconnect 
> worker completed2024-10-18T06:35:28.647+11:00 - 
> o.a.c.framework.imps.EnsembleTracker - Invalid config event received: 
> {server.2=xhhuj10717:2888:3888:participant, 
> server.1=xhhuj10716:2888:3888:participant, 
> server.5=xhwuj10517:2888:3888:participant, 
> server.4=xhwuj10516:2888:3888:participant, 
> server.3=xhhuj10721:2888:3888:participant, version=0} {noformat}
>  
> +*Our Questions*+
> 1.For the logging Bug "Invalid config event received"  , we need to upgrade 
> the Curator as per https://issues.apache.org/jira/browse/CURATOR-526 please 
> confirm
> 2. For the reconnections , we are suspecting that it is related to BUG 
> https://issues.apache.org/jira/browse/CURATOR-525 , however we need to 
> confirm this (how?), also what is the trigger for this BUG knowing that we 
> have another jar applications having the same Curator/Zookeeper version and 
> they are not facing this BUG if our assumptions is correct ? please advice 
> the trigger here and why it is happening on some applications and not 
> happening for others. 
> 3. Can we upgrade the curator to version 5.2.0 without upgrading the Zookeper 
> knowing that we use Zookeeper is zookeeper-3.5.7.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to