[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881320#comment-17881320
 ] 

Margaret Figura commented on ZOOKEEPER-4444:
--------------------------------------------

Hi,

We seem to have also run into the same issue. ZooKeeper version is 3.8.1, and 
it is running on Kubernetes using the Bitnami chart. It is a 3 broker ensemble. 
The issue persisted until we deleted the data (PVC) of the affected broker.

One thing that is odd... Unfortunately I'm unable to access the system to check 
current status, but the timestamps on the logs we collected are far from 
current time (8 months ago). I noticed [~fshlin]'s logs also have timestamps 
that don't match up with the time he posted his comment (by about 6 months). 
However, [~dbros]'s logs are at least close to the time he posted.

Thank you!

Our logs:
{code:java}
zookeeper 02:45:38.53 
zookeeper 02:45:38.53 Welcome to the Bitnami 
zookeeper container
zookeeper 02:45:38.53 Subscribe to project updates by 
watching https://github.com/bitnami/containers
zookeeper 02:45:38.54 Submit issues and feature requests 
at https://github.com/bitnami/containers/issues
zookeeper 02:45:38.54 
zookeeper 02:45:38.54 INFO  ==> ** Starting 
ZooKeeper setup **
zookeeper 02:45:38.59 WARN  ==> You have set 
the environment variable ALLOW_ANONYMOUS_LOGIN=yes. For safety reasons, do not 
use this flag in a production environment.
zookeeper 02:45:38.61 INFO  ==> Initializing 
ZooKeeper...
zookeeper 02:45:38.62 INFO  ==> No injected 
configuration file found, creating default config files...
zookeeper 02:45:38.68 INFO  ==> Adding 
server: 
kafka-zookeeper-0.kafka-zookeeper-headless.empirix-cloud.svc.cluster.local:2888:3888
 with id: 1
zookeeper 02:45:38.69 INFO  ==> Adding 
server: 
kafka-zookeeper-1.kafka-zookeeper-headless.empirix-cloud.svc.cluster.local:2888:3888
 with id: 2
zookeeper 02:45:38.69 INFO  ==> Adding 
server: 
kafka-zookeeper-2.kafka-zookeeper-headless.empirix-cloud.svc.cluster.local:2888:3888
 with id: 3
zookeeper 02:45:38.72 INFO  ==> Enabling 
Prometheus metrics...
zookeeper 02:45:38.75 INFO  ==> Deploying 
ZooKeeper with persisted data...
zookeeper 02:45:38.75 INFO  ==> ** ZooKeeper 
setup finished! **zookeeper 02:45:38.77 INFO 
 ==> ** Starting ZooKeeper **
/opt/bitnami/java/bin/java
ZooKeeper JMX enabled by default
Using config: /opt/bitnami/zookeeper/bin/../conf/zoo.cfg
2024-01-11 02:45:39,746 [myid:1] - ERROR [main:o.a.z.s.DataTree@1807] - First 
digest mismatch on txn: 74196379424522240,3141,64424510482,1701895146384,14
, 
v{s{13,#000112f636f6e74726f6c6c65725f65706f636800022},s{5,#0003d2f62726f6b6572732f746f706963732f4f64735f4564725f736d70705f736d735f7864725f696e6465782f706172746974696f6e732f312f73746174650004b7b22636f6e74726f6c6c65725f65706f6368223a33332c226c6561646572223a312c2276657273696f6e223a312c226c65616465725f65706f6368223a3235312c22697372223a5b315d7d00179}}
, expected digest is 2,1621536443130
, actual digest is 1632889486483, 
2024-01-11 02:45:39,755 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510482(next log) for type 14
2024-01-11 02:45:39,755 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510483(next log) for type 14
2024-01-11 02:45:39,756 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510484(next log) for type 14
2024-01-11 02:45:39,756 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510485(next log) for type 14
2024-01-11 02:45:39,756 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510486(next log) for type 14
2024-01-11 02:45:39,756 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510487(next log) for type 14
2024-01-11 02:45:39,757 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510488(next log) for type 14
2024-01-11 02:45:39,757 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510489(next log) for type 14
2024-01-11 02:45:39,757 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510490(next log) for type 14
2024-01-11 02:45:39,758 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510491(next log) for type 14
2024-01-11 02:45:39,758 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510492(next log) for type 14
2024-01-11 02:45:39,758 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510493(next log) for type 14
2024-01-11 02:45:39,758 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510494(next log) for type 14
2024-01-11 02:45:39,758 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510495(next log) for type 14
2024-01-11 02:45:39,759 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510496(next log) for type 14
2024-01-11 02:45:39,759 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510497(next log) for type 14
2024-01-11 02:45:39,759 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510498(next log) for type 14
2024-01-11 02:45:39,759 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510499(next log) for type 14
2024-01-11 02:45:39,759 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510500(next log) for type 14
2024-01-11 02:45:39,759 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510501(next log) for type 14
2024-01-11 02:45:39,760 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510502(next log) for type 14
2024-01-11 02:45:39,760 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510503(next log) for type 14
2024-01-11 02:45:39,760 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510504(next log) for type 14
2024-01-11 02:45:39,760 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510505(next log) for type 14
2024-01-11 02:45:39,760 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510506(next log) for type 14
2024-01-11 02:45:39,760 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510507(next log) for type 14
2024-01-11 02:45:39,760 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510508(next log) for type 14
2024-01-11 02:45:39,761 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510509(next log) for type 14
2024-01-11 02:45:39,761 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510510(next log) for type 14
2024-01-11 02:45:39,761 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510511(next log) for type 14
2024-01-11 02:45:39,761 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510512(next log) for type 14
2024-01-11 02:45:39,761 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510513(next log) for type 14
2024-01-11 02:45:39,762 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510514(next log) for type 14
2024-01-11 02:45:39,762 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510515(next log) for type 14
2024-01-11 02:45:39,762 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510516(next log) for type 14
2024-01-11 02:45:39,762 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510517(next log) for type 14
2024-01-11 02:45:39,762 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510518(next log) for type 14
2024-01-11 02:45:39,762 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510519(next log) for type 14
2024-01-11 02:45:39,762 [myid:1] - ERROR [main:o.a.z.s.p.FileTxnSnapLog@345] - 
64424510521(highestZxid) > 64424510520(next log) for type 14
 {code}

> Follower doesn't get synchronized after process restart
> -------------------------------------------------------
>
>                 Key: ZOOKEEPER-4444
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4444
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.6.3
>            Reporter: Andreas Weber
>            Priority: Major
>
> Hi folks, I've got an issue with 3.6.3.
> I can't provide a simple test, because it seems to depend on timing in a 
> cluster environment, but I tried to reduce the scenario as far as possible:
>  * zookeeper cluster with 5 nodes, all of them Followers (no Observers)
>  * start some parallel threads which do some writes in the cluster (e.g. 
> create/delete znodes)
>  * kill one of the zookeeper processes, let's say on node X (where node X is 
> not the Leader)
>  * restart zookeeper process on node X
>  * wait a few seconds
>  * kill zookeeper process on node X again
>  * restart zookeeper process on node X again
> In some cases (every 3-4 runs) I see the following in the log of node X:
> After first restart of node X:
> {noformat}
>  WARN  persistence.FileTxnLog           - Current zxid 4294968525 is <= 
> 4294969524 for 15
>  WARN  persistence.FileTxnLog           - Current zxid 4294968526 is <= 
> 4294969524 for 15
>  WARN  persistence.FileTxnLog           - Current zxid 4294968527 is <= 
> 4294969524 for 15
>  ... (this kind of WARN is repeated some hundred times)
>  WARN  quorum.SendAckRequestProcessor   - Closing connection to leader, 
> exception during packet send java.net.SocketException: Socket closed ...
>  ... (this kind of WARN is repeated some hundred times)
> {noformat}
> After second restart of node X:
> {noformat}
>  ERROR persistence.FileTxnSnapLog       - 4294970146(highestZxid) > 
> 4294969147(next log) for type 2
>  WARN  server.DataTree                  - Message:Digests are not matching. 
> Value is Zxid. Value:4294969147
>  ERROR server.DataTree                  - First digest mismatch on txn: 
> 360466402305310720,3870,4294969147,1639258399998,2
> , ...
> , expected digest is 2,1365261838770
> , actual digest is 1098406565142, 
>  ERROR persistence.FileTxnSnapLog       - 4294970146(highestZxid) > 
> 4294969148(next log) for type 2
>  ERROR persistence.FileTxnSnapLog       - 4294970146(highestZxid) > 
> 4294969149(next log) for type 5
>  ERROR persistence.FileTxnSnapLog       - 4294970146(highestZxid) > 
> 4294969150(next log) for type 2
>  ... (this kind of ERROR is repeated some hundred times)
> {noformat}
> And afterwards (in the actual application), zookepeer on node X seems to have 
> a different view of the cluster state and doesn't get synchronized, at least 
> for a few hours.
> This e.g. leads to phantom reads of znodes that were deleted a long time ago.
> (The resulting behaviour looks a little bit similar as described in 
> ZOOKEEPER-3911.)
> This does not happen with zookeeper 3.6.2 !
> (at least I can't reproduce it with this version)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to