catsun26 opened a new issue, #17814: URL: https://github.com/apache/pulsar/issues/17814
### Search before asking - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar. ### Version centos 7.9 apache-pulsar-2.8.0 one pulsar cluster apache-pulsar-2.8.0 + broker-2.8.1 and two pulsar cluster ### Minimal reproduce step 1.Change the attribute in the broker.conf from 322 to 222, restart broker and stop one bookie's server managedLedgerDefaultEnsembleSize=3 managedLedgerDefaultWriteQuorum=2 managedLedgerDefaultAckQuorum=2 Change to managedLedgerDefaultEnsembleSize=2 managedLedgerDefaultWriteQuorum=2 managedLedgerDefaultAckQuorum=2 2.[root@pulsar1 apache-pulsar-2.8.0]# pulsar-admin brokers get-runtime-config |grep managedLedgerDefaultEnsembleSize "managedLedgerDefaultEnsembleSize 2" 3. pulsar-client produce bsc/k8s/test -n 1 -m "hello" 14:29:47.029 [pulsar-client-io-1-1] INFO org.apache.pulsar.client.impl.ConnectionPool - [[id: 0x3d78fba3, L:/127.0.0.1:40608 - R:localhost/127.0.0.1:6660]] Connected to server 14:29:47.136 [pulsar-client-io-1-1] INFO org.apache.pulsar.client.impl.ProducerStatsRecorderImpl - Starting Pulsar producer perf with config: { "topicName" : "bsc/k8s/test", "producerName" : null, "sendTimeoutMs" : 30000, "blockIfQueueFull" : false, "maxPendingMessages" : 1000, "maxPendingMessagesAcrossPartitions" : 50000, "messageRoutingMode" : "RoundRobinPartition", "hashingScheme" : "JavaStringHash", "cryptoFailureAction" : "FAIL", "batchingMaxPublishDelayMicros" : 1000, "batchingPartitionSwitchFrequencyByPublishDelay" : 10, "batchingMaxMessages" : 1000, "batchingMaxBytes" : 131072, "batchingEnabled" : true, "chunkingEnabled" : false, "compressionType" : "NONE", "initialSequenceId" : null, "autoUpdatePartitions" : true, "autoUpdatePartitionsIntervalSeconds" : 60, "multiSchema" : true, "accessMode" : "Shared", "properties" : { } } 14:29:47.146 [pulsar-client-io-1-1] INFO org.apache.pulsar.client.impl.ProducerStatsRecorderImpl - Pulsar client config: { "serviceUrl" : "pulsar://localhost:6660/", "authPluginClassName" : null, "authParams" : null, "authParamMap" : null, "operationTimeoutMs" : 30000, "statsIntervalSeconds" : 60, "numIoThreads" : 1, "numListenerThreads" : 1, "connectionsPerBroker" : 1, "useTcpNoDelay" : true, "useTls" : false, "tlsTrustCertsFilePath" : "", "tlsAllowInsecureConnection" : false, "tlsHostnameVerificationEnable" : false, "concurrentLookupRequest" : 5000, "maxLookupRequest" : 50000, "maxLookupRedirects" : 20, "maxNumberOfRejectedRequestPerConnection" : 50, "keepAliveIntervalSeconds" : 30, "connectionTimeoutMs" : 10000, "requestTimeoutMs" : 60000, "initialBackoffIntervalNanos" : 100000000, "maxBackoffIntervalNanos" : 60000000000, "enableBusyWait" : false, "listenerName" : null, "useKeyStoreTls" : false, "sslProvider" : null, "tlsTrustStoreType" : "JKS", "tlsTrustStorePath" : "", "tlsTrustStorePassword" : "", "tlsCiphers" : [ ], "tlsProtocols" : [ ], "memoryLimitBytes" : 0, "proxyServiceUrl" : null, "proxyProtocol" : null, "enableTransaction" : false } 14:29:47.201 [pulsar-client-io-1-1] INFO org.apache.pulsar.client.impl.ConnectionPool - [[id: 0xd4f73691, L:/127.0.0.1:40614 - R:localhost/127.0.0.1:6660]] Connected to server 14:29:47.201 [pulsar-client-io-1-1] INFO org.apache.pulsar.client.impl.ClientCnx - [id: 0xd4f73691, L:/127.0.0.1:40614 - R:localhost/127.0.0.1:6660] Connected through proxy to target broker at pulsar3:6650 14:29:47.208 [pulsar-client-io-1-1] INFO org.apache.pulsar.client.impl.ProducerImpl - [bsc/k8s/test] [null] Creating producer on cnx [id: 0xd4f73691, L:/127.0.0.1:40614 - R:localhost/127.0.0.1:6660] 14:29:47.230 [pulsar-client-io-1-1] WARN org.apache.pulsar.client.impl.ClientCnx - [id: 0xd4f73691, L:/127.0.0.1:40614 - R:localhost/127.0.0.1:6660] Received error from server: org.apache.bookkeeper.mledger.ManagedLedgerException: Not enough non-faulty bookies available 14:29:47.231 [pulsar-client-io-1-1] ERROR org.apache.pulsar.client.impl.ProducerImpl - [bsc/k8s/test] [null] Failed to create producer: org.apache.bookkeeper.mledger.ManagedLedgerException: Not enough non-faulty bookies available 14:29:47.232 [pulsar-client-io-1-1] WARN org.apache.pulsar.client.impl.ConnectionHandler - [bsc/k8s/test] [null] Could not get connection to broker: org.apache.bookkeeper.mledger.ManagedLedgerException: Not enough non-faulty bookies available -- Will try again in 0.1 s 14:29:47.333 [pulsar-timer-5-1] INFO org.apache.pulsar.client.impl.ConnectionHandler - [bsc/k8s/test] [null] Reconnecting after connection was closed 4. broker's logs .PerChannelBookieClient - Disconnected from bookie channel [id: 0xaff4064a, L:/192.168.209.83:54132 ! R:192.168.209.81/192.168.209.81:3181] 14:29:19.727 [pulsar-io-4-7] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0xe474d3ad, L:/192.168.209.83:54119 ! R:192.168.209.81/192.168.209.81:3181] 14:29:19.727 [pulsar-io-4-2] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Exception caught on:[id: 0x74fdd840, L:/192.168.209.83:54112 - R:192.168.209.81/192.168.209.81:3181] cause: readAddress(..) failed: Connection reset by peer 14:29:19.727 [pulsar-io-4-2] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0x74fdd840, L:/192.168.209.83:54112 ! R:192.168.209.81/192.168.209.81:3181] 14:29:19.727 [pulsar-io-4-1] WARN org.apache.bookkeeper.proto.PerChannelBookieClient - Exception caught on:[id: 0x320e056f, L:/192.168.209.83:54128 - R:192.168.209.81/192.168.209.81:3181] cause: readAddress(..) failed: Connection reset by peer 14:29:19.727 [pulsar-io-4-1] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0x320e056f, L:/192.168.209.83:54128 ! R:192.168.209.81/192.168.209.81:3181] 14:29:19.779 [main-EventThread] INFO org.apache.bookkeeper.discover.ZKRegistrationClient - Invalidate cache for 192.168.209.81:3181 14:29:19.779 [main-EventThread] INFO org.apache.bookkeeper.discover.ZKRegistrationClient - Invalidate cache for 192.168.209.81:3181 14:29:19.783 [BookKeeperClientScheduler-OrderedScheduler-0-0] INFO org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node: /default-rack/192.168.209.81:3181 14:29:19.783 [BookKeeperClientScheduler-OrderedScheduler-0-0] INFO org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node: /default-rack/192.168.209.81:3181 14:29:20.897 [pulsar-web-40-3] INFO org.eclipse.jetty.server.RequestLog - 192.168.209.81 - - [23/九月/2022:14:29:20 +0800] "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.29.1" 0 14:29:20.900 [prometheus-stats-41-1] INFO org.eclipse.jetty.server.RequestLog - 192.168.209.81 - - [23/九月/2022:14:29:20 +0800] "GET /metrics/ HTTP/1.1" 200 28244 "http://192.168.209.83:8080/metrics" "Prometheus/2.29.1" 2 14:29:21.884 [pulsar-web-40-8] INFO org.eclipse.jetty.server.RequestLog - 192.168.209.81 - - [23/九月/2022:14:29:21 +0800] "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.29.1" 1 14:29:21.887 [prometheus-stats-41-1] INFO org.eclipse.jetty.server.RequestLog - 192.168.209.81 - - [23/九月/2022:14:29:21 +0800] "GET /metrics/ HTTP/1.1" 200 28244 "http://192.168.209.83:8080/metrics" "Prometheus/2.29.1" 3 14:29:35.897 [pulsar-web-40-1] INFO org.eclipse.jetty.server.RequestLog - 192.168.209.81 - - [23/九月/2022:14:29:35 +0800] "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.29.1" 0 14:29:35.900 [prometheus-stats-41-1] INFO org.eclipse.jetty.server.RequestLog - 192.168.209.81 - - [23/九月/2022:14:29:35 +0800] "GET /metrics/ HTTP/1.1" 200 28244 "http://192.168.209.83:8080/metrics" "Prometheus/2.29.1" 3 14:29:36.884 [pulsar-web-40-4] INFO org.eclipse.jetty.server.RequestLog - 192.168.209.81 - - [23/九月/2022:14:29:36 +0800] "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.29.1" 1 14:29:36.891 [prometheus-stats-41-1] INFO org.eclipse.jetty.server.RequestLog - 192.168.209.81 - - [23/九月/2022:14:29:36 +0800] "GET /metrics/ HTTP/1.1" 200 28245 "http://192.168.209.83:8080/metrics" "Prometheus/2.29.1" 6 14:29:47.058 [pulsar-io-4-5] INFO org.apache.pulsar.broker.service.ServerCnx - New connection from /192.168.209.81:33458 14:29:47.208 [pulsar-io-4-6] INFO org.apache.pulsar.broker.service.ServerCnx - New connection from /192.168.209.81:33464 14:29:47.215 [pulsar-io-4-6] INFO org.apache.pulsar.broker.service.ServerCnx - [/192.168.209.81:33464][persistent://bsc/k8s/test] Creating producer. producerId=0 14:29:47.216 [pulsar-ordered-OrderedExecutor-3-0] INFO org.apache.pulsar.broker.PulsarService - No ledger offloader configured, using NULL instance 14:29:47.216 [pulsar-ordered-OrderedExecutor-3-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Opening managed ledger bsc/k8s/persistent/test 14:29:47.217 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] INFO org.apache.bookkeeper.mledger.impl.MetaStoreImpl - Creating '/managed-ledgers/bsc/k8s/persistent/test' 14:29:47.220 [pulsar-ordered-OrderedExecutor-1-0-EventThread] INFO org.apache.pulsar.zookeeper.ZooKeeperCache - [State:CONNECTED Timeout:30000 sessionid:0x100004f5f9b000b local:/192.168.209.83:36584 remoteserver:pulsar1/192.168.209.81:2181 lastZxid:30064773910 xid:251 sent:251 recv:254 queuedpkts:0 pendingresp:0 queuedevents:1] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeCreated path:/managed-ledgers/bsc/k8s/persistent/test 14:29:47.221 [metadata-store-6-1] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [bsc/k8s/persistent/test] Creating ledger, metadata: {component=[109, 97, 110, 97, 103, 101, 100, 45, 108, 101, 100, 103, 101, 114], pulsar/managed-ledger=[98, 115, 99, 47, 107, 56, 115, 47, 112, 101, 114, 115, 105, 115, 116, 101, 110, 116, 47, 116, 101, 115, 116], application=[112, 117, 108, 115, 97, 114]} - metadata ops timeout : 60 seconds 14:29:47.221 [metadata-store-6-1] WARN org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:192.168.209.82:3181>, <Bookie:192.168.209.83:3181>], allBookies [<Bookie:192.168.209.82:3181>, <Bookie:192.168.209.83:3181>]. 14:29:47.221 [metadata-store-6-1] WARN org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:192.168.209.82:3181>, <Bookie:192.168.209.83:3181>], allBookies [<Bookie:192.168.209.82:3181>, <Bookie:192.168.209.83:3181>]. 14:29:47.221 [metadata-store-6-1] ERROR org.apache.bookkeeper.client.LedgerCreateOp - Not enough bookies to create ledger with ensembleSize=3, writeQuorumSize=2 and ackQuorumSize=2 14:29:47.221 [BookKeeperClientWorker-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl - [bsc/k8s/persistent/test] Failed to initialize managed ledger: Not enough non-faulty bookies available 14:29:47.221 [pulsar-ordered-OrderedExecutor-1-0-EventThread] INFO org.apache.pulsar.zookeeper.ZooKeeperManagedLedgerCache - [State:CONNECTED Timeout:30000 sessionid:0x100004f5f9b000b local:/192.168.209.83:36584 remoteserver:pulsar1/192.168.209.81:2181 lastZxid:30064773910 xid:251 sent:251 recv:254 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/managed-ledgers/bsc/k8s/persistent 14:29:47.221 [pulsar-ordered-OrderedExecutor-1-0-EventThread] INFO org.apache.pulsar.zookeeper.ZooKeeperManagedLedgerCache - invalidate called in zookeeperChildrenCache for path /managed-ledgers/bsc/k8s/persistent 14:29:47.222 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [bsc/k8s/persistent/test] Closing managed ledger 14:29:47.222 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN org.apache.pulsar.broker.service.BrokerService - Failed to create topic persistent://bsc/k8s/test org.apache.bookkeeper.mledger.ManagedLedgerException: Not enough non-faulty bookies available 14:29:47.350 [pulsar-io-4-6] INFO org.apache.pulsar.broker.service.ServerCnx - [/192.168.209.81:33464][persistent://bsc/k8s/test] Creating producer. producerId=0 14:29:47.351 [pulsar-ordered-OrderedExecutor-3-0] INFO org.apache.pulsar.broker.PulsarService - No ledger offloader configured, using NULL instance 14:29:47.351 [pulsar-ordered-OrderedExecutor-3-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Opening managed ledger bsc/k8s/persistent/test 14:29:47.352 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [bsc/k8s/persistent/test] Creating ledger, metadata: {component=[109, 97, 110, 97, 103, 101, 100, 45, 108, 101, 100, 103, 101, 114], pulsar/managed-ledger=[98, 115, 99, 47, 107, 56, 115, 47, 112, 101, 114, 115, 105, 115, 116, 101, 110, 116, 47, 116, 101, 115, 116], application=[112, 117, 108, 115, 97, 114]} - metadata ops timeout : 60 seconds 14:29:47.353 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] WARN org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:192.168.209.82:3181>, <Bookie:192.168.209.83:3181>], allBookies [<Bookie:192.168.209.82:3181>, <Bookie:192.168.209.83:3181>]. 14:29:47.353 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] WARN org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:192.168.209.82:3181>, <Bookie:192.168.209.83:3181>], allBookies [<Bookie:192.168.209.82:3181>, <Bookie:192.168.209.83:3181>]. 14:29:47.353 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] ERROR org.apache.bookkeeper.client.LedgerCreateOp - Not enough bookies to create ledger with ensembleSize=3, writeQuorumSize=2 and ackQuorumSize=2 14:29:47.353 [BookKeeperClientWorker-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl - [bsc/k8s/persistent/test] Failed to initialize managed ledger: Not enough non-faulty bookies available 14:29:47.353 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [bsc/k8s/persistent/test] Closing managed ledger 14:29:47.353 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN org.apache.pulsar.broker.service.BrokerService - Failed to create topic persistent://bsc/k8s/test org.apache.bookkeeper.mledger.ManagedLedgerException: Not enough non-faulty bookies available ### What did you expect to see? The cluster can still produce and consume normally after stopping a bookie service ### What did you see instead? The puslar cluster cannot normally produce and consume when a bookmark service is stopped Change the attribute in the broker.conf from 322 to 222, restart broker and stop one bookie's server managedLedgerDefaultEnsembleSize=3 managedLedgerDefaultWriteQuorum=2 managedLedgerDefaultAckQuorum=2 Change to managedLedgerDefaultEnsembleSize=2 managedLedgerDefaultWriteQuorum=2 managedLedgerDefaultAckQuorum=2 ### Anything else? _No response_ ### Are you willing to submit a PR? - [ ] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
