RaulGracia commented on issue #3466:
URL: https://github.com/apache/bookkeeper/issues/3466#issuecomment-1235540195

   Thanks @eolivelli, reposting the answer in Slack for better visibility: 
https://pravega-io.slack.com/archives/C0151LSM46L/p1662115749173509 
   
   > is the ledger znode present on ZK ?
   
   The cluster is not in state anymore, but if the Bookkeeper Client gets a 
`BKException$BKNoSuchLedgerExistsException`, my understanding is that it should 
not be there. However, I see that the ledger `3246 `is created at `2022-07-20 
14:45:25,706` and then we can see in Bookie logs some queries on these ledger 
metadata ZNodes:
   ```
   
2-bookie-9c88965c-2022-07-20T14-55-24Z.log.gz:2022-07-20T14:51:48.716729331+00:00
 stdout F 14:51:48,716 DEBUG Reading reply session id: 0x3016baeab7f000f, 
packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000 
serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000 finished:false header:: 
36287,8  replyHeader:: 36287,38654709412,0  request:: 
'/pravega/pravega/bookkeeper/ledgers/00/0000,F  response:: 
v{'L3195,'L3070,'L3071,'L3192,'L2784,'L3078,'L3199,'L3079,'L3076,'L3197,'L3077,'L3198,'L3081,'L0377,'L3089,'L3087,'L0372,'L3052,'L3173,'L3053,'L3295,'L3050,'L3171,'L3172,'L3293,'L3290,'L3170,'L3291,'L0463,'L3058,'L0464,'L3059,'L3056,'L3177,'L3057,'L3178,'L0461,'L3054,'L3175,'L3055,'L3176,'L3297,'L3063,'L3184,'L3064,'L3061,'L3182,'L3062,'L3183,'L3181,'L3069,'L3067,'L3188,'L3068,'L3065,'L3066,'L3310,'L1814,'L3317,'L3203,'L3324,'L3322,'L3202,'L3323,'L3200,'L0734,'L1705,'L0737,'L3209,'L0736,'L3207,'L3328,'L3208,'L3329,'L3205,'L0732,'L3327,'L0739,'L0738,'L3096,'L3094,'L3095,'L3092,'
 
L3093,'L3091,'L0388,'L0702,'L0704,'L0703,'L2327,'L0709,'L0706,'L0705,'L0708,'L0707,'L0390,'L3302,'L2452,'L0713,'L0712,'L3308,'L0714,'L3306,'L3307,'L0711,'L0710,'L3305,'L3115,'L3236,'L3116,'L3237,'L0762,'L3113,'L3234,'L3114,'L3233,'L3230,'L3110,'L3231,'L0768,'L0404,'L0767,'L0769,'L0764,'L3119,'L0763,'L3238,'L0765,'L3118,'L0771,'L3126,'L1983,'L3127,'L3248,'L3245,'L3125,'L3246,'L3122,'L3243,'L3123,'L3244,'L3120,'L3000,'L0418,'L2957,'L3009,'L2955,'L3007,'L3214,'L0740,'L2485,'L3211,'L3330,'L0745,'L3218,'L0741,'L0744,'L3104,'L3105,'L3226,'L3102,'L3103,'L3224,'L3101,'L2130,'L0757,'L0756,'L2816,'L0759,'L0758,'L3108,'L3107,'L3030,'L3151,'L3031,'L3152,'L3273,'L3150,'L3038,'L3159,'L2984,'L3278,'L2982,'L3037,'L3279,'L3155,'L2980,'L3156,'L3277,'L2064,'L3032,'L3153,'L3274,'L3033,'L3154,'L3275,'L2989,'L0444,'L2988,'L0447,'L2985,'L2986,'L2073,'L3041,'L3162,'L3283,'L3042,'L3163,'L3160,'L3281,'L3040,'L3161,'L3280,'L0694,'L0696,'L2992,'L3047,'L3048,'L3169,'L3166,'L3167,'L0692,'L3043,'L3164,'L3165,'L29
 
99,'L2996,'L3130,'L3137,'L3017,'L3014,'L3135,'L3015,'L3136,'L3012,'L3134,'L3010,'L3131,'L3252,'L3132,'L2969,'L2966,'L3018,'L3139,'L3019,'L3140,'L3020,'L3141,'L3148,'L3269,'L0671,'L2610,'L2973,'L3149,'L3025,'L2971,'L3147,'L3024,'L3145,'L3142,'L3143,'L2979,'L2735,'L2974,'L3029}
   
2-bookie-9c88965c-2022-07-20T14-55-24Z.log.gz:2022-07-20T14:51:48.736027964+00:00
 stdout F 14:51:48,735 DEBUG Reading reply session id: 0x3016baeab7f000f, 
packet:: clientPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 
serverPath:/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 finished:false 
header:: 36521,4  replyHeader:: 36521,38654709412,0  request:: 
'/pravega/pravega/bookkeeper/ledgers/00/0000/L3246,F  response:: 
#426f6f6b69654d65746164617461466f726d617456657273696f6e933a7c8210218ffffff9f92002833238a19626f6f6b6b65657065722d626f6f6b69652d322d3131353830a19626f6f6b6b65657065722d626f6f6b69652d302d32383831361003834204825a16ab6170706c69636174696f6e127507261766567615a15af426f6f6b4b65657065724c6f6749641223133600,s{38654707364,38654708391,1658328325290,1658328491164,2,0,0,0,155,0,38654707364}
   ```
   I don't know if the fact that this request queries 
`/pravega/pravega/bookkeeper/ledgers/00/0000/L3246 `and gets some actual 
payload in response means that it actually exists in Zookeeper.
   
   > which are your replication parameters: ES, WQ, AQ ?
   
   In this experiment, the configuration is `ensembleSize=2, writeQuorumSize=2, 
ackQuorumSize=2`.
   
   > how many bookies do you have ?
   
   The Bookkeeper service is configured to keep 4 Bookies.
   
   > how many entries are supposed to be in the ledger ? only 1 ?
   
   Pravega rolls over ledger when they reach their max size (1GB default) or 
when there is some container recovery (due to fencing). So, this particular 
ledger was created but contained 0 entries at the moment when the issue 
happened. It would have contained many more in case the issue wouldn't had 
appeared.
   
   > at a first glance it looks to me that N bookies are answering that they do 
not know the ledger
   
   Could be, according to this log, the ledger should be created using 2 
Bookies:
   `2022-07-20T14:45:25.706687124+00:00 stdout F 2022-07-20 14:45:25,706 
7043990 [ZKC-connect-executor-0-EventThread] INFO  o.a.b.client.LedgerCreateOp 
- Ensemble: [bookkeeper-bookie-2-11580, bookkeeper-bookie-0-28816] for ledger: 
3246`
   But in the Bookie logs, I mainly see `2-bookie-9c88965c` with logs related 
to ledger `3246`. The test inducing network drops may be exposing this 
potential issue that is hard to discover otherwise.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to