[ https://issues.apache.org/jira/browse/CURATOR-723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amr Elazhary updated CURATOR-723: --------------------------------- Description: We have a jar application we are using the below curator-framework-4.3.0 and zookeeper-3.5.7 on the client servers for such application !https://issues.apache.org/jira/s/xd97tr/820010/13pdxe5/_/images/fav-jsw.png! CURATOR-525 There is a race condition in Curator which might lead to fake SUSPENDED event and ruin CuratorFrameworkImpl inner state - ASF JIRA (State Bug) !https://issues.apache.org/jira/s/xd97tr/820010/13pdxe5/_/images/fav-jsw.png! CURATOR-526 Error logged for valid config - "Invalid config event received: \{properties}" - ASF JIRA (logging Bug) Above two Bugs Fixed in 5.0.0/5.2.0 We have the below logs on our application (probably a combination of these two bugs) which ends up on consuming a lot of CPU and a failure after (due to open/close connections) {noformat} 2024-10-18T06:35:27.855+11:00 - o.a.c.f.state.ConnectionStateManager - State change: SUSPENDED 2024-10-18T06:35:27.855+11:00 - - New connection state : SUSPENDED 2024-10-18T06:35:27.855+11:00 - - Connection to server has been suspended.. 2024-10-18T06:35:28.638+11:00 - org.apache.curator.ConnectionState - Negotiated session timeout: 100002024-10-18T06:35:28.639+11:00 - o.a.c.f.state.ConnectionStateManager - State change: RECONNECTED 2024-10-18T06:35:28.639+11:00 - o.a.c.f.imps.CuratorFrameworkImpl - Clearing sleep for 0 operations 2024-10-18T06:35:28.639+11:00 - - New connection state : RECONNECTED 2024-10-18T06:35:28.639+11:00 - - Connection to server has been unsuspended...2024-10-18T06:35:28.640+11:00 - o.a.c.framework.imps.EnsembleTracker - New config event received: {}2024-10-18T06:35:28.640+11:00 - - Reconnect worker starting 2024-10-18T06:35:28.640+11:00 - - Attempting to reconnect to the Herd 2024-10-18T06:35:28.640+11:00 - - Waiting for connection to server 2024-10-18T06:35:28.640+11:00 - - Connected to server 2024-10-18T06:35:28.640+11:00 - - Reconnect worker completed2024-10-18T06:35:28.647+11:00 - o.a.c.framework.imps.EnsembleTracker - Invalid config event received: {}{noformat} +*Our Questions*+ 1.For the logging Bug "Invalid config event received" , we need to upgrade the Curator as per https://issues.apache.org/jira/browse/CURATOR-526 please confirm 2. For the reconnections , we are suspecting that it is related to BUG https://issues.apache.org/jira/browse/CURATOR-525 , however we need to confirm this (how?), also what is the trigger for this BUG knowing that we have another jar applications having the same Curator/Zookeeper version and they are not facing this BUG if our assumption is correct ? please advice the trigger here and why it is happening on some applications and not happening for others. also do we have any other solution rather than upgrading the curator version if this is the root cause? 3. Can we upgrade the curator to version 5.2.0 without upgrading the Zookeper knowing that we use Zookeeper is zookeeper-3.5.7. was: We have a jar application we are using the below curator-framework-4.3.0 and zookeeper-3.5.7 on the client servers for such application !https://issues.apache.org/jira/s/xd97tr/820010/13pdxe5/_/images/fav-jsw.png! CURATOR-525 There is a race condition in Curator which might lead to fake SUSPENDED event and ruin CuratorFrameworkImpl inner state - ASF JIRA (State Bug) !https://issues.apache.org/jira/s/xd97tr/820010/13pdxe5/_/images/fav-jsw.png! CURATOR-526 Error logged for valid config - "Invalid config event received: \{properties}" - ASF JIRA (logging Bug) Above two Bugs Fixed in 5.0.0/5.2.0 We have the below logs on our application (probably a combination of these two bugs) which ends up on consuming a lot of CPU and a failure after (due to open/close connections) {noformat} 2024-10-18T06:35:27.855+11:00 - o.a.c.f.state.ConnectionStateManager - State change: SUSPENDED 2024-10-18T06:35:27.855+11:00 - au.com.unico.herd.impl.HerdImpl - New connection state : SUSPENDED 2024-10-18T06:35:27.855+11:00 - au.com.unico.herd.impl.HerdImpl - Connection to server has been suspended.. 2024-10-18T06:35:28.638+11:00 - org.apache.curator.ConnectionState - Negotiated session timeout: 100002024-10-18T06:35:28.639+11:00 - o.a.c.f.state.ConnectionStateManager - State change: RECONNECTED 2024-10-18T06:35:28.639+11:00 - o.a.c.f.imps.CuratorFrameworkImpl - Clearing sleep for 0 operations 2024-10-18T06:35:28.639+11:00 - au.com.unico.herd.impl.HerdImpl - New connection state : RECONNECTED 2024-10-18T06:35:28.639+11:00 - au.com.unico.herd.impl.HerdImpl - Connection to server has been unsuspended...2024-10-18T06:35:28.640+11:00 - o.a.c.framework.imps.EnsembleTracker - New config event received: {server.2=xhhuj10717:2888:3888:participant, server.1=xhhuj10716:2888:3888:participant, server.5=xhwuj10517:2888:3888:participant, server.4=xhwuj10516:2888:3888:participant, server.3=xhhuj10721:2888:3888:participant, version=0}2024-10-18T06:35:28.640+11:00 - au.com.unico.herd.impl.HerdImpl - Reconnect worker starting 2024-10-18T06:35:28.640+11:00 - au.com.unico.herd.impl.HerdImpl - Attempting to reconnect to the Herd 2024-10-18T06:35:28.640+11:00 - au.com.unico.herd.impl.HerdImpl - Waiting for connection to server 2024-10-18T06:35:28.640+11:00 - au.com.unico.herd.impl.HerdImpl - Connected to server 2024-10-18T06:35:28.640+11:00 - au.com.unico.herd.impl.HerdImpl - Reconnect worker completed2024-10-18T06:35:28.647+11:00 - o.a.c.framework.imps.EnsembleTracker - Invalid config event received: {server.2=xhhuj10717:2888:3888:participant, server.1=xhhuj10716:2888:3888:participant, server.5=xhwuj10517:2888:3888:participant, server.4=xhwuj10516:2888:3888:participant, server.3=xhhuj10721:2888:3888:participant, version=0} {noformat} +*Our Questions*+ 1.For the logging Bug "Invalid config event received" , we need to upgrade the Curator as per https://issues.apache.org/jira/browse/CURATOR-526 please confirm 2. For the reconnections , we are suspecting that it is related to BUG https://issues.apache.org/jira/browse/CURATOR-525 , however we need to confirm this (how?), also what is the trigger for this BUG knowing that we have another jar applications having the same Curator/Zookeeper version and they are not facing this BUG if our assumption is correct ? please advice the trigger here and why it is happening on some applications and not happening for others. also do we have any other solution rather than upgrading the curator version if this is the root cause? 3. Can we upgrade the curator to version 5.2.0 without upgrading the Zookeper knowing that we use Zookeeper is zookeeper-3.5.7. > [ > - > > Key: CURATOR-723 > URL: https://issues.apache.org/jira/browse/CURATOR-723 > Project: Apache Curator > Issue Type: Bug > Components: Framework > Affects Versions: 4.3.0 > Reporter: Amr Elazhary > Priority: Major > > We have a jar application > we are using the below curator-framework-4.3.0 and zookeeper-3.5.7 on the > client servers for such application > !https://issues.apache.org/jira/s/xd97tr/820010/13pdxe5/_/images/fav-jsw.png! > CURATOR-525 There is a race condition in Curator which might lead to fake > SUSPENDED event and ruin CuratorFrameworkImpl inner state - ASF JIRA (State > Bug) > !https://issues.apache.org/jira/s/xd97tr/820010/13pdxe5/_/images/fav-jsw.png! > CURATOR-526 Error logged for valid config - "Invalid config event received: > \{properties}" - ASF JIRA (logging Bug) > Above two Bugs Fixed in 5.0.0/5.2.0 > > We have the below logs on our application (probably a combination of these > two bugs) which ends up on consuming a lot of CPU and a failure after (due to > open/close connections) > {noformat} > 2024-10-18T06:35:27.855+11:00 - o.a.c.f.state.ConnectionStateManager - > State change: SUSPENDED > 2024-10-18T06:35:27.855+11:00 - - New connection state : SUSPENDED > 2024-10-18T06:35:27.855+11:00 - - Connection to server has been suspended.. > 2024-10-18T06:35:28.638+11:00 - org.apache.curator.ConnectionState - > Negotiated session timeout: 100002024-10-18T06:35:28.639+11:00 - > o.a.c.f.state.ConnectionStateManager - State change: RECONNECTED > 2024-10-18T06:35:28.639+11:00 - o.a.c.f.imps.CuratorFrameworkImpl - > Clearing sleep for 0 operations > 2024-10-18T06:35:28.639+11:00 - - New connection state : RECONNECTED > 2024-10-18T06:35:28.639+11:00 - - Connection to server has been > unsuspended...2024-10-18T06:35:28.640+11:00 - > o.a.c.framework.imps.EnsembleTracker - New config event received: > {}2024-10-18T06:35:28.640+11:00 - - Reconnect worker starting > 2024-10-18T06:35:28.640+11:00 - - Attempting to reconnect to the Herd > 2024-10-18T06:35:28.640+11:00 - - Waiting for connection to server > 2024-10-18T06:35:28.640+11:00 - - Connected to server > 2024-10-18T06:35:28.640+11:00 - - Reconnect worker > completed2024-10-18T06:35:28.647+11:00 - o.a.c.framework.imps.EnsembleTracker > - Invalid config event received: {}{noformat} > > +*Our Questions*+ > 1.For the logging Bug "Invalid config event received" , we need to upgrade > the Curator as per https://issues.apache.org/jira/browse/CURATOR-526 please > confirm > 2. For the reconnections , we are suspecting that it is related to BUG > https://issues.apache.org/jira/browse/CURATOR-525 , however we need to > confirm this (how?), also what is the trigger for this BUG knowing that we > have another jar applications having the same Curator/Zookeeper version and > they are not facing this BUG if our assumption is correct ? please advice the > trigger here and why it is happening on some applications and not happening > for others. also do we have any other solution rather than upgrading the > curator version if this is the root cause? > 3. Can we upgrade the curator to version 5.2.0 without upgrading the Zookeper > knowing that we use Zookeeper is zookeeper-3.5.7. -- This message was sent by Atlassian Jira (v8.20.10#820010)