RaulGracia opened a new issue, #3466:
URL: https://github.com/apache/bookkeeper/issues/3466

   **BUG REPORT**
   
   ***Describe the bug***
   
   During our tests of Pravega (https://cncf.pravega.io/), we have detected a 
regression after upgrading to Bookkeeper 4.15 
(https://github.com/pravega/pravega/pull/6676). The regression is consistently 
reproducible when doing a fault injection test (30% network packet loss). Note 
that this kind of test consistently pass using Bookkeeper 4.14, for which we 
believe that is a regression or a change in the behavior of the new Bookkeeper 
release.
   
   A reproduction of this issue goes as follows:
   
   1. Pravega creates a ledger, in this case ledger `3246`:
   ```
   2022-07-20T14:45:25.706422066+00:00 stdout F 2022-07-20 14:45:25,706 7043990 
[ZKC-connect-executor-0-SendThread(zookeeper-client:2181)] DEBUG 
org.apache.zookeeper.ClientCnxn - Reading reply session id: 0x2016bc13e0a001c, 
packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 
serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 finished:false 
header:: 127,1  replyHeader:: 127,38654707364,0  request:: 
'/pravega/pravega/bookkeeper/ledgers/00/0000/L3246,#426f6f6b69654d65746164617461466f726d617456657273696f6e933affffff8c18210218020ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff12813238a19626f6f6b6b65657065722d626f6f6b69652d322d3131353830a19626f6f6b6b65657065722d626f6f6b69652d302d32383831361003834204825a16ab6170706c69636174696f6e127507261766567615a15af426f6f6b4b65657065724c6f674964122313360ffffffe2ffffffe5ffffffcdffffff8bffffff81ffffff93ffffffd8ffffff9c10,v{s{31,s{'world,'anyone}}},0
  response:: '/pravega/pravega/bookkeeper/ledgers/00/0000
 /L3246
   2022-07-20T14:45:25.706687124+00:00 stdout F 2022-07-20 14:45:25,706 7043990 
[ZKC-connect-executor-0-EventThread] INFO  o.a.b.client.LedgerCreateOp - 
Ensemble: [bookkeeper-bookie-2-11580, bookkeeper-bookie-0-28816] for ledger: 
3246
   2022-07-20T14:45:25.706725535+00:00 stdout F 2022-07-20 14:45:25,706 7043990 
[core-13] INFO  i.p.s.s.i.bookkeeper.BookKeeperLog - Log[13]: Created Ledger 
3246.
   2022-07-20T14:45:47.426112317+00:00 stdout F 2022-07-20 14:45:47,426 7065710 
[ZKC-connect-executor-0-SendThread(zookeeper-client:2181)] DEBUG 
org.apache.zookeeper.ClientCnxn - Reading reply session id: 0x2016bc13e0a001c, 
packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 
serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 finished:false 
header:: 145,4  replyHeader:: 145,38654707423,0  request:: 
'/pravega/pravega/bookkeeper/ledgers/00/0000/L3246,F  response:: 
#426f6f6b69654d65746164617461466f726d617456657273696f6e933affffff8c18210218020ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff12813238a19626f6f6b6b65657065722d626f6f6b69652d322d3131353830a19626f6f6b6b65657065722d626f6f6b69652d302d32383831361003834204825a16ab6170706c69636174696f6e127507261766567615a15af426f6f6b4b65657065724c6f674964122313360ffffffe2ffffffe5ffffffcdffffff8bffffff81ffffff93ffffffd8ffffff9c10,s{38654707364,38654707364,1658328325290,1658328325290,0,0,0,0,172,0,386
 54707364}
   2022-07-20T14:45:47.426207948+00:00 stdout F 2022-07-20 14:45:47,426 7065710 
[ZKC-connect-executor-0-EventThread] DEBUG o.a.b.meta.AbstractZkLedgerManager - 
Registered ledger metadata listener ReadOnlyLedgerHandle(lid = 3246, id = 
672259987) on ledger 3246.
   ```
   
   2. Probably after some network failures induced by the test, we see that 
this ledger is unregistered (has it been actually removed from Zookeeper?): 
   ```
   2022-07-20T14:45:47.426475848+00:00 stdout F 2022-07-20 14:45:47,426 7065710 
[ZKC-connect-executor-0-SendThread(zookeeper-client:2181)] DEBUG 
org.apache.zookeeper.ClientCnxn - Reading reply session id: 0x2016bc13e0a001c, 
packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 
serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 finished:false 
header:: 146,4  replyHeader:: 146,38654707423,0  request:: 
'/pravega/pravega/bookkeeper/ledgers/00/0000/L3246,T  response:: 
#426f6f6b69654d65746164617461466f726d617456657273696f6e933affffff8c18210218020ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff12813238a19626f6f6b6b65657065722d626f6f6b69652d322d3131353830a19626f6f6b6b65657065722d626f6f6b69652d302d32383831361003834204825a16ab6170706c69636174696f6e127507261766567615a15af426f6f6b4b65657065724c6f674964122313360ffffffe2ffffffe5ffffffcdffffff8bffffff81ffffff93ffffffd8ffffff9c10,s{38654707364,38654707364,1658328325290,1658328325290,0,0,0,0,172,0,386
 54707364}
   2022-07-20T14:45:47.426637438+00:00 stdout F 2022-07-20 14:45:47,426 7065710 
[BookKeeperClientWorker-OrderedExecutor-7-0] DEBUG 
o.a.b.client.PendingReadLacOp - Read LAC complete with enough validResponse for 
ledger: 3246 LAC: -1
   2022-07-20T14:45:47.426737033+00:00 stdout F 2022-07-20 14:45:47,426 7065710 
[core-13] DEBUG o.a.b.meta.AbstractZkLedgerManager - Unregistered ledger 
metadata listener ReadOnlyLedgerHandle(lid = 3246, id = 672259987) on ledger 
3246.
   2022-07-20T14:46:16.612224510+00:00 stdout F 2022-07-20 14:46:16,612 7094896 
[BookKeeperClientWorker-OrderedExecutor-7-0] DEBUG 
o.a.b.proto.PerChannelBookieClient - Could not write Add request to bookie 
192.169.1.60/192.169.1.60:3181 for ledger 3246, entry 0
   2022-07-20T14:46:16.612271701+00:00 stdout F 2022-07-20 14:46:16,612 7094896 
[BookKeeperClientWorker-OrderedExecutor-7-0] ERROR 
o.a.bookkeeper.client.PendingAddOp - Write of ledger entry to quorum failed: 
L3246 E0
   2022-07-20T14:46:16.616440402+00:00 stdout F 2022-07-20 14:46:16,616 7094900 
[BookKeeperClientWorker-OrderedExecutor-7-0] DEBUG 
o.a.b.proto.PerChannelBookieClient - Could not write ReadLAC request to bookie 
null for ledger 3246, entry -1
   ```
   
   3. After that, Pravega still assumes that the ledger is there (as Pravega 
itself has not deleted it), but any further operation on this ledger fails: 
   ```
   2022-07-20T14:46:16.630253410+00:00 stdout F 2022-07-20 14:46:16,629 7094913 
[core-30] DEBUG o.a.bookkeeper.client.LedgerHandle - Failed to close ledger 
3246 :
   2022-07-20T14:46:16.630795088+00:00 stdout F 2022-07-20 14:46:16,630 7094914 
[core-30] ERROR i.p.s.s.i.bookkeeper.BookKeeperLog - Log[13]: Unable to close 
LedgerHandle for Ledger 3246.
   2022-07-20T14:46:16.630795088+00:00 stdout F 
io.pravega.segmentstore.storage.DurableDataLogException: Unable to close ledger 
3246.
   ```
   
   4. This makes that, when finding the ledger missing, that Segment Container 
in Pravega is stuck and cannot recover:
   ```
   2022-07-20T15:59:48.793802385+00:00 stdout F 2022-07-20 15:59:48,793 
11507077 [BookKeeperClientWorker-OrderedExecutor-7-0] DEBUG 
o.a.bookkeeper.client.PendingReadOp - No such entry found on bookie.  L3246 E0 
bookie: bookkeeper-bookie-0-28816
   2022-07-20T15:59:48.793815178+00:00 stdout F 2022-07-20 15:59:48,793 
11507077 [BookKeeperClientWorker-OrderedExecutor-7-0] ERROR 
o.a.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L3246 E0-E0, 
Sent to [bookkeeper-bookie-0-28816, bookkeeper-bookie-2-11580], Heard from [] : 
bitset = {}, Error = 'No such ledger exists on Bookies'. First unread entry is 
(-1, rc = null)
   2022-07-20T15:59:48.793906376+00:00 stdout F 2022-07-20 15:59:48,793 
11507077 [core-1] INFO  i.p.s.s.c.StreamSegmentContainerMetadata - 
SegmentContainer[13]: Exit RecoveryMode.
   ...
   2022-07-20T15:59:48.794030332+00:00 stdout F 2022-07-20 15:59:48,794 
11507078 [core-8] INFO  i.p.s.s.r.StreamSegmentReadIndex - ReadIndex[13-123]: 
Closed.
   2022-07-20T15:59:48.794605613+00:00 stdout F 2022-07-20 15:59:48,794 
11507078 [core-1] ERROR i.p.s.server.logs.DurableLog - DurableLog[13] Recovery 
FAILED.
   2022-07-20T15:59:48.794605613+00:00 stdout F 
org.apache.bookkeeper.client.BKException$BKNoSuchLedgerExistsException: No such 
ledger exists on Bookies
   2022-07-20T15:59:48.794605613+00:00 stdout F         at 
org.apache.bookkeeper.client.BKException.create(BKException.java:74)
   2022-07-20T15:59:48.794605613+00:00 stdout F         at 
org.apache.bookkeeper.client.PendingReadOp.submitCallback(PendingReadOp.java:654)
   2022-07-20T15:59:48.794605613+00:00 stdout F         at 
org.apache.bookkeeper.client.PendingReadOp$LedgerEntryRequest.fail(PendingReadOp.java:176)
   2022-07-20T15:59:48.794605613+00:00 stdout F         at 
org.apache.bookkeeper.client.PendingReadOp$SequenceReadRequest.sendNextRead(PendingReadOp.java:397)
   2022-07-20T15:59:48.794605613+00:00 stdout F         at 
org.apache.bookkeeper.client.PendingReadOp$SequenceReadRequest.logErrorAndReattemptRead(PendingReadOp.java:440)
   ```
   
   5. At the Bookies side, we can see that the fence request is received by one 
Bookie (bookie-2) at `2022-07-20T14:48:11`:
   ```
   
2-bookie-9c88965c-2022-07-20T14-49-58Z.log.gz:2022-07-20T14:48:11.157496389+00:00
 stdout F 14:48:11,157 INFO  Ledger fence request received for ledger: 3246 
from address: /192.168.0.228:57782
   
2-bookie-9c88965c-2022-07-20T14-49-58Z.log.gz:2022-07-20T14:48:11.160872045+00:00
 stdout F 14:48:11,160 INFO  Ledger fence request received for ledger: 3246 
from address: /192.168.0.228:57782
   ```
   
   6. And that ledger seems to be present in the metadata of ledgers, at least 
for some time (look for `L3246`):
   ```
   
2-bookie-9c88965c-2022-07-20T14-55-24Z.log.gz:2022-07-20T14:51:48.716729331+00:00
 stdout F 14:51:48,716 DEBUG Reading reply session id: 0x3016baeab7f000f, 
packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000 
serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000 finished:false header:: 
36287,8  replyHeader:: 36287,38654709412,0  request:: 
'/pravega/pravega/bookkeeper/ledgers/00/0000,F  response:: 
v{'L3195,'L3070,'L3071,'L3192,'L2784,'L3078,'L3199,'L3079,'L3076,'L3197,'L3077,'L3198,'L3081,'L0377,'L3089,'L3087,'L0372,'L3052,'L3173,'L3053,'L3295,'L3050,'L3171,'L3172,'L3293,'L3290,'L3170,'L3291,'L0463,'L3058,'L0464,'L3059,'L3056,'L3177,'L3057,'L3178,'L0461,'L3054,'L3175,'L3055,'L3176,'L3297,'L3063,'L3184,'L3064,'L3061,'L3182,'L3062,'L3183,'L3181,'L3069,'L3067,'L3188,'L3068,'L3065,'L3066,'L3310,'L1814,'L3317,'L3203,'L3324,'L3322,'L3202,'L3323,'L3200,'L0734,'L1705,'L0737,'L3209,'L0736,'L3207,'L3328,'L3208,'L3329,'L3205,'L0732,'L3327,'L0739,'L0738,'L3096,'L3094,'L3095,'L3092,'
 
L3093,'L3091,'L0388,'L0702,'L0704,'L0703,'L2327,'L0709,'L0706,'L0705,'L0708,'L0707,'L0390,'L3302,'L2452,'L0713,'L0712,'L3308,'L0714,'L3306,'L3307,'L0711,'L0710,'L3305,'L3115,'L3236,'L3116,'L3237,'L0762,'L3113,'L3234,'L3114,'L3233,'L3230,'L3110,'L3231,'L0768,'L0404,'L0767,'L0769,'L0764,'L3119,'L0763,'L3238,'L0765,'L3118,'L0771,'L3126,'L1983,'L3127,'L3248,'L3245,'L3125,'L3246,'L3122,'L3243,'L3123,'L3244,'L3120,'L3000,'L0418,'L2957,'L3009,'L2955,'L3007,'L3214,'L0740,'L2485,'L3211,'L3330,'L0745,'L3218,'L0741,'L0744,'L3104,'L3105,'L3226,'L3102,'L3103,'L3224,'L3101,'L2130,'L0757,'L0756,'L2816,'L0759,'L0758,'L3108,'L3107,'L3030,'L3151,'L3031,'L3152,'L3273,'L3150,'L3038,'L3159,'L2984,'L3278,'L2982,'L3037,'L3279,'L3155,'L2980,'L3156,'L3277,'L2064,'L3032,'L3153,'L3274,'L3033,'L3154,'L3275,'L2989,'L0444,'L2988,'L0447,'L2985,'L2986,'L2073,'L3041,'L3162,'L3283,'L3042,'L3163,'L3160,'L3281,'L3040,'L3161,'L3280,'L0694,'L0696,'L2992,'L3047,'L3048,'L3169,'L3166,'L3167,'L0692,'L3043,'L3164,'L3165,'L29
 
99,'L2996,'L3130,'L3137,'L3017,'L3014,'L3135,'L3015,'L3136,'L3012,'L3134,'L3010,'L3131,'L3252,'L3132,'L2969,'L2966,'L3018,'L3139,'L3019,'L3140,'L3020,'L3141,'L3148,'L3269,'L0671,'L2610,'L2973,'L3149,'L3025,'L2971,'L3147,'L3024,'L3145,'L3142,'L3143,'L2979,'L2735,'L2974,'L3029}
   
2-bookie-9c88965c-2022-07-20T14-55-24Z.log.gz:2022-07-20T14:51:48.736027964+00:00
 stdout F 14:51:48,735 DEBUG Reading reply session id: 0x3016baeab7f000f, 
packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 
serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 finished:false 
header:: 36521,4  replyHeader:: 36521,38654709412,0  request:: 
'/pravega/pravega/bookkeeper/ledgers/00/0000/L3246,F  response:: 
#426f6f6b69654d65746164617461466f726d617456657273696f6e933a7c8210218ffffff9f92002833238a19626f6f6b6b65657065722d626f6f6b69652d322d3131353830a19626f6f6b6b65657065722d626f6f6b69652d302d32383831361003834204825a16ab6170706c69636174696f6e127507261766567615a15af426f6f6b4b65657065724c6f6749641223133600,s{38654707364,38654708391,1658328325290,1658328491164,2,0,0,0,155,0,38654707364}
   
   
2-bookie-9c88965c-2022-07-20T14-55-52Z.log.gz:2022-07-20T14:55:21.070507213+00:00
 stdout F 14:55:21,070 DEBUG Reading reply session id: 0x3016baeab7f000f, 
packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000 
serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000 finished:false header:: 
36739,8  replyHeader:: 36739,38654709965,0  request:: 
'/pravega/pravega/bookkeeper/ledgers/00/0000,F  response:: 
v{'L3195,'L3070,'L3071,'L3192,'L2784,'L3078,'L3199,'L3079,'L3076,'L3197,'L3077,'L3198,'L3081,'L0377,'L3089,'L3087,'L0372,'L3052,'L3053,'L3295,'L3050,'L3171,'L3172,'L3293,'L3170,'L3291,'L0463,'L3058,'L0464,'L3059,'L3056,'L3177,'L3057,'L3178,'L0461,'L3054,'L3175,'L3055,'L3176,'L3063,'L3184,'L3064,'L3061,'L3182,'L3062,'L3183,'L3181,'L3069,'L3067,'L3188,'L3068,'L3065,'L3066,'L1814,'L3317,'L3203,'L3324,'L3322,'L3202,'L3323,'L3200,'L0734,'L1705,'L0737,'L3209,'L0736,'L3207,'L3208,'L3329,'L3205,'L0732,'L3327,'L0739,'L0738,'L3096,'L3094,'L3095,'L3092,'L3093,'L3091,'L0388,'L0702,'L0704,'
 
L0703,'L2327,'L0709,'L0706,'L0705,'L0708,'L0707,'L0390,'L3302,'L2452,'L0713,'L0712,'L3308,'L0714,'L3306,'L3307,'L0711,'L0710,'L3305,'L3350,'L3115,'L3236,'L3116,'L3237,'L3358,'L0762,'L3113,'L3234,'L3114,'L3353,'L3233,'L3354,'L3230,'L3351,'L3110,'L3231,'L3352,'L0768,'L0404,'L0767,'L0769,'L0764,'L3119,'L0763,'L3238,'L0765,'L3118,'L3360,'L0771,'L3126,'L3368,'L1983,'L3248,'L3245,'L3366,'L3125,'L3246,'L3122,'L3243,'L3364,'L3123,'L3244,'L3365,'L3120,'L3000,'L3363,'L0418,'L2957,'L3009,'L2955,'L3007,'L3214,'L0740,'L2485,'L3211,'L3330,'L0745,'L3218,'L0741,'L0744,'L3338,'L3104,'L3346,'L3105,'L3226,'L3347,'L3102,'L3103,'L3224,'L3345,'L3101,'L2130,'L3341,'L0757,'L0756,'L2816,'L0759,'L0758,'L3108,'L3348,'L3107,'L3349,'L3030,'L3151,'L3031,'L3152,'L3273,'L3150,'L3390,'L3038,'L3159,'L2984,'L3278,'L2982,'L3037,'L3279,'L3155,'L2980,'L3156,'L3277,'L2064,'L3032,'L3153,'L3274,'L3033,'L3154,'L3275,'L2989,'L0444,'L2988,'L0447,'L2985,'L2986,'L2073,'L3041,'L3162,'L3042,'L3163,'L3160,'L3281,'L3040,'L3161,'L32
 
80,'L0694,'L0696,'L2992,'L3047,'L3048,'L3169,'L3166,'L3167,'L0692,'L3043,'L3164,'L3165,'L2999,'L2996,'L3130,'L3372,'L3370,'L3137,'L3379,'L3017,'L3014,'L3135,'L3377,'L3015,'L3136,'L3378,'L3012,'L3375,'L3134,'L3010,'L3131,'L3252,'L3373,'L3132,'L3374,'L2969,'L2966,'L3018,'L3139,'L3019,'L3140,'L3382,'L3020,'L3141,'L3383,'L3380,'L3381,'L3148,'L3269,'L0671,'L2610,'L2973,'L3149,'L3025,'L3388,'L2971,'L3147,'L3389,'L3024,'L3387,'L3142,'L3384,'L3143,'L3385,'L2979,'L2735,'L2974,'L3029}
   
2-bookie-9c88965c-2022-07-20T14-55-52Z.log.gz:2022-07-20T14:55:21.090055084+00:00
 stdout F 14:55:21,090 DEBUG Reading reply session id: 0x3016baeab7f000f, 
packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 
serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 finished:false 
header:: 36970,4  replyHeader:: 36970,38654709965,0  request:: 
'/pravega/pravega/bookkeeper/ledgers/00/0000/L3246,F  response:: 
#426f6f6b69654d65746164617461466f726d617456657273696f6e933a7c8210218ffffff9f92002833238a19626f6f6b6b65657065722d626f6f6b69652d322d3131353830a19626f6f6b6b65657065722d626f6f6b69652d302d32383831361003834204825a16ab6170706c69636174696f6e127507261766567615a15af426f6f6b4b65657065724c6f6749641223133600,s{38654707364,38654708391,1658328325290,1658328491164,2,0,0,0,155,0,38654707364}
   
2-bookie-9c88965c-2022-07-20T14-55-52Z.log.gz:2022-07-20T14:55:21.095978533+00:00
 stdout F 14:55:21,095 INFO  Following ledgers: [3329, 771, 3076, 3077, 3081, 
3338, 2064, 3089, 3346, 3091, 3092, 3348, 1814, 3094, 3350, 2327, 3095, 3351, 
3096, 3352, 2073, 3354, 3101, 3102, 3104, 3105, 3107, 3108, 3114, 3115, 3116, 
3118, 3119, 2610, 3122, 3123, 3125, 3130, 3131, 3132, 3134, 3135, 3136, 3139, 
3140, 3141, 3143, 3147, 3148, 3150, 3151, 3152, 3153, 3155, 3156, 3159, 3160, 
3162, 3163, 3164, 3165, 3166, 3167, 3169, 3171, 3172, 3175, 3176, 3177, 3182, 
3183, 3184, 372, 3199, 3202, 388, 3205, 390, 3208, 3209, 3211, 2957, 3214, 
2452, 3224, 3226, 2971, 3230, 671, 3231, 3233, 3236, 1705, 3243, 2988, 3244, 
3245, 3246, 2735, 692, 2485, 694, 2999, 3000, 702, 447, 703, 704, 707, 708, 
710, 3014, 712, 713, 3273, 714, 3274, 3277, 3278, 463, 3025, 3281, 3029, 3031, 
3032, 3033, 3291, 732, 3037, 734, 3038, 3295, 736, 3040, 737, 3041, 738, 3042, 
739, 3043, 740, 741, 3047, 744, 745, 3305, 3050, 3306, 3307,
  3052, 3308, 3053, 3054, 3055, 3057, 3058, 756, 3061, 3317, 758, 759, 3063, 
3064, 3065, 3066, 763, 3068, 765, 3069, 767, 3071, 3327] of bookie: 
[bookkeeper-bookie-0-28816] are identified as underreplicated
   
   
2-bookie-9c88965c-2022-07-20T15-00-53Z.log.gz:2022-07-20T14:56:19.912265378+00:00
 stdout F 14:56:19,912 DEBUG Reading reply session id: 0x1016bac9b630005, 
packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000 
serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000 finished:false header:: 
6,8  replyHeader:: 6,38654710371,0  request:: 
'/pravega/pravega/bookkeeper/ledgers/00/0000,F  response:: 
v{'L3195,'L3070,'L3071,'L3192,'L2784,'L3078,'L3199,'L3079,'L3076,'L3197,'L3077,'L3198,'L3081,'L3401,'L0377,'L3400,'L3089,'L3087,'L0372,'L3407,'L3408,'L3405,'L3406,'L3403,'L3404,'L3052,'L3053,'L3295,'L3050,'L3171,'L3172,'L3293,'L3170,'L3291,'L0463,'L3058,'L0464,'L3059,'L3056,'L3177,'L3057,'L3178,'L0461,'L3054,'L3175,'L3055,'L3176,'L3063,'L3184,'L3064,'L3061,'L3182,'L3062,'L3183,'L3181,'L3069,'L3067,'L3188,'L3068,'L3065,'L3066,'L1814,'L3317,'L3203,'L3324,'L3322,'L3202,'L3323,'L3200,'L0734,'L1705,'L0737,'L3209,'L0736,'L3207,'L3208,'L3205,'L0732,'L3327,'L0739,'L0738,'L3096,'L3094,'L3095,'L
 
3092,'L3093,'L3091,'L3412,'L3413,'L0388,'L3410,'L3411,'L0702,'L0704,'L3418,'L0703,'L3419,'L2327,'L3416,'L3417,'L3414,'L3415,'L0709,'L0706,'L0705,'L0708,'L0707,'L0390,'L3302,'L3423,'L3424,'L3421,'L3422,'L2452,'L3420,'L0713,'L0712,'L3308,'L0714,'L3306,'L3307,'L0711,'L3425,'L0710,'L3305,'L3426,'L3350,'L3115,'L3236,'L3116,'L3237,'L3358,'L0762,'L3113,'L3234,'L3114,'L3353,'L3233,'L3354,'L3230,'L3351,'L3110,'L3231,'L3352,'L0768,'L0404,'L0767,'L0769,'L0764,'L3119,'L0763,'L3238,'L0765,'L3118,'L3360,'L0771,'L3126,'L3368,'L1983,'L3248,'L3245,'L3366,'L3125,'L3246,'L3122,'L3243,'L3364,'L3123,'L3244,'L3365,'L3120,'L3000,'L3363,'L0418,'L2957,'L3009,'L2955,'L3007,'L3214,'L0740,'L2485,'L3211,'L3330,'L0745,'L3218,'L0741,'L0744,'L3338,'L3104,'L3346,'L3105,'L3226,'L3347,'L3102,'L3103,'L3224,'L3345,'L3101,'L2130,'L3341,'L0757,'L0756,'L2816,'L0759,'L0758,'L3108,'L3348,'L3107,'L3349,'L3030,'L3151,'L3393,'L3031,'L3152,'L3273,'L3391,'L3150,'L3392,'L3038,'L3159,'L2984,'L3278,'L3399,'L2982,'L3037,'L3279,'L315
 
5,'L2980,'L3156,'L3277,'L3398,'L2064,'L3032,'L3153,'L3274,'L3395,'L3033,'L3154,'L3275,'L3396,'L2989,'L0444,'L2988,'L0447,'L2985,'L2986,'L2073,'L3041,'L3162,'L3042,'L3163,'L3160,'L3281,'L3040,'L3161,'L3280,'L0694,'L0696,'L2992,'L3047,'L3048,'L3169,'L3166,'L3167,'L0692,'L3043,'L3164,'L3165,'L2999,'L2996,'L3130,'L3372,'L3137,'L3017,'L3014,'L3135,'L3015,'L3136,'L3378,'L3012,'L3134,'L3010,'L3131,'L3252,'L3132,'L3374,'L2969,'L2966,'L3018,'L3139,'L3019,'L3140,'L3382,'L3020,'L3141,'L3148,'L3269,'L0671,'L2610,'L2973,'L3149,'L3025,'L3388,'L2971,'L3147,'L3389,'L3024,'L3142,'L3384,'L3143,'L3385,'L2979,'L2735,'L2974,'L3029}
   
2-bookie-9c88965c-2022-07-20T15-00-53Z.log.gz:2022-07-20T14:56:19.932194546+00:00
 stdout F 14:56:19,932 DEBUG Reading reply session id: 0x1016bac9b630005, 
packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000 
serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000 finished:false header:: 
12,8  replyHeader:: 12,38654710371,0  request:: 
'/pravega/pravega/bookkeeper/ledgers/00/0000,F  response:: 
v{'L3195,'L3070,'L3071,'L3192,'L2784,'L3078,'L3199,'L3079,'L3076,'L3197,'L3077,'L3198,'L3081,'L3401,'L0377,'L3400,'L3089,'L3087,'L0372,'L3407,'L3408,'L3405,'L3406,'L3403,'L3404,'L3052,'L3053,'L3295,'L3050,'L3171,'L3172,'L3293,'L3170,'L3291,'L0463,'L3058,'L0464,'L3059,'L3056,'L3177,'L3057,'L3178,'L0461,'L3054,'L3175,'L3055,'L3176,'L3063,'L3184,'L3064,'L3061,'L3182,'L3062,'L3183,'L3181,'L3069,'L3067,'L3188,'L3068,'L3065,'L3066,'L1814,'L3317,'L3203,'L3324,'L3322,'L3202,'L3323,'L3200,'L0734,'L1705,'L0737,'L3209,'L0736,'L3207,'L3208,'L3205,'L0732,'L3327,'L0739,'L0738,'L3096,'L3094,'L3095,
 
'L3092,'L3093,'L3091,'L3412,'L3413,'L0388,'L3410,'L3411,'L0702,'L0704,'L3418,'L0703,'L3419,'L2327,'L3416,'L3417,'L3414,'L3415,'L0709,'L0706,'L0705,'L0708,'L0707,'L0390,'L3302,'L3423,'L3424,'L3421,'L3422,'L2452,'L3420,'L0713,'L0712,'L3308,'L0714,'L3306,'L3307,'L0711,'L3425,'L0710,'L3305,'L3426,'L3350,'L3115,'L3236,'L3116,'L3237,'L3358,'L0762,'L3113,'L3234,'L3114,'L3353,'L3233,'L3354,'L3230,'L3351,'L3110,'L3231,'L3352,'L0768,'L0404,'L0767,'L0769,'L0764,'L3119,'L0763,'L3238,'L0765,'L3118,'L3360,'L0771,'L3126,'L3368,'L1983,'L3248,'L3245,'L3366,'L3125,'L3246,'L3122,'L3243,'L3364,'L3123,'L3244,'L3365,'L3120,'L3000,'L3363,'L0418,'L2957,'L3009,'L2955,'L3007,'L3214,'L0740,'L2485,'L3211,'L3330,'L0745,'L3218,'L0741,'L0744,'L3338,'L3104,'L3346,'L3105,'L3226,'L3347,'L3102,'L3103,'L3224,'L3345,'L3101,'L2130,'L3341,'L0757,'L0756,'L2816,'L0759,'L0758,'L3108,'L3348,'L3107,'L3349,'L3030,'L3151,'L3393,'L3031,'L3152,'L3273,'L3391,'L3150,'L3392,'L3038,'L3159,'L2984,'L3278,'L3399,'L2982,'L3037,'L3279,'L3
 
155,'L2980,'L3156,'L3277,'L3398,'L2064,'L3032,'L3153,'L3274,'L3395,'L3033,'L3154,'L3275,'L3396,'L2989,'L0444,'L2988,'L0447,'L2985,'L2986,'L2073,'L3041,'L3162,'L3042,'L3163,'L3160,'L3281,'L3040,'L3161,'L3280,'L0694,'L0696,'L2992,'L3047,'L3048,'L3169,'L3166,'L3167,'L0692,'L3043,'L3164,'L3165,'L2999,'L2996,'L3130,'L3372,'L3137,'L3017,'L3014,'L3135,'L3015,'L3136,'L3378,'L3012,'L3134,'L3010,'L3131,'L3252,'L3132,'L3374,'L2969,'L2966,'L3018,'L3139,'L3019,'L3140,'L3382,'L3020,'L3141,'L3148,'L3269,'L0671,'L2610,'L2973,'L3149,'L3025,'L3388,'L2971,'L3147,'L3389,'L3024,'L3142,'L3384,'L3143,'L3385,'L2979,'L2735,'L2974,'L3029}
   
   
2-bookie-9c88965c-2022-07-20T15-00-53Z.log.gz:2022-07-20T14:56:32.051382550+00:00
 stdout F 14:56:32,051 DEBUG Reading reply session id: 0x3016baeab7f000f, 
packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 
serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 finished:false 
header:: 40548,4  replyHeader:: 40548,38654710417,0  request:: 
'/pravega/pravega/bookkeeper/ledgers/00/0000/L3246,F  response:: 
#426f6f6b69654d65746164617461466f726d617456657273696f6e933a7c8210218ffffff9f92002833238a19626f6f6b6b65657065722d626f6f6b69652d322d3131353830a19626f6f6b6b65657065722d626f6f6b69652d302d32383831361003834204825a16ab6170706c69636174696f6e127507261766567615a15af426f6f6b4b65657065724c6f6749641223133600,s{38654707364,38654708391,1658328325290,1658328491164,2,0,0,0,155,0,38654707364}
   ```
   
   7. But the ledger does not exist when we attempt to read from it:
   ```
   
0-bookie-adfe5832-2022-07-20T18-23-58Z.log.gz:2022-07-20T15:59:48.625362817+00:00
 stdout F 15:59:48,625 INFO  No ledger found while reading entry: 0 from 
ledger: 3246
   
2-bookie-9c88965c-2022-07-20T17-26-08Z.log.gz:2022-07-20T15:59:48.624934442+00:00
 stdout F 15:59:48,624 INFO  No ledger found while reading entry: 0 from 
ledger: 3246
   ...
   
2-bookie-9c88965c-2022-07-20T17-26-08Z.log.gz:2022-07-20T16:39:42.753841016+00:00
 stdout F 16:39:42,753 INFO  No ledger found while reading entry: 0 from 
ledger: 3246
   ```
   
   In my understanding, from the Pravega point of view ledger `3246` is created 
and not explicitly deleted by Pravega itself, which means that there has been a 
data loss, most likely induced by some corner case exposed via network packet 
loss. In normal conditions, where there are no induced packet loss, things work 
fine. But it does not mean that, in a real cluster, a similar network failure 
could happen and the same issue could be reproduced. Again, we have been doing 
fault injection tests with Bookkeeper 4.14 and probably previous versions as 
well (e.g., Bookkeeper 4.11), and this is the first time we encounter this 
behavior.
   
   ***To Reproduce***
   
   Steps to reproduce the behavior:
   1. Write to Bookkeeper using Pravega (https://github.com/pravega/pravega).
   2. Introduce 30% network packet loss with LitmusChaos (see 
https://www.youtube.com/watch?v=EU_g8cKa1G8 for a demo on how to reproduce this 
experiment)
   5. Consistently, Pravega gets a `No Such Ledger Exception` in some 
containers using Bookkeeper 4.15 (with Bookkeeper 4.14, the system is able to 
work under such conditions)
   
   ***Expected behavior***
   
   Bookkeeper 4.15 should offer a similar behavior in how ledgers are managed 
compared to Bookkeeper 4.14 (and previous versions).
   
   ***Screenshots***
   
   n/a
   
   ***Additional context***
   
   At the moment, there are 2 changes that could be related to this issue (but 
I cannot confirm that they actually cause the problem):
   1. Upgrade of Zookeeper library from 3.6.x to 3.8.x.
   2. This PR in 4.15 and not in 4.14 seems related to how ledgers are managed: 
https://github.com/apache/bookkeeper/pull/3361


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to