RaulGracia opened a new issue #2485:
URL: https://github.com/apache/bookkeeper/issues/2485


   **QUESTION**
   
   We have done a set of experiments with restarting a Kubernetes node with 
Bookies running inside. In one of the experiment, we found that one Bookie was 
not able to restart which the following:
   ```
   2020-11-11 07:43:42,429 - INFO - [main:JournalChannel@154] - Opening journal 
/bk/journal/j1/current/175b5dae741.txn
   2020-11-11 07:43:42,475 - ERROR - [main:Bookie@924] - Exception while 
replaying journals, shutting down
   java.io.IOException: Missing ledger signature while reading header for 
/bk/index/current/1/9/109.idx
    at org.apache.bookkeeper.bookie.FileInfo.readHeader(FileInfo.java:224)
    at org.apache.bookkeeper.bookie.FileInfo.checkOpen(FileInfo.java:310)
    at org.apache.bookkeeper.bookie.FileInfo.checkOpen(FileInfo.java:278)
    at org.apache.bookkeeper.bookie.FileInfo.size(FileInfo.java:388)
    at 
org.apache.bookkeeper.bookie.IndexPersistenceMgr.updatePage(IndexPersistenceMgr.java:643)
    at 
org.apache.bookkeeper.bookie.IndexInMemPageMgr.grabLedgerEntryPage(IndexInMemPageMgr.java:447)
    at 
org.apache.bookkeeper.bookie.IndexInMemPageMgr.getLedgerEntryPage(IndexInMemPageMgr.java:412)
    at 
org.apache.bookkeeper.bookie.IndexInMemPageMgr.putEntryOffset(IndexInMemPageMgr.java:571)
    at 
org.apache.bookkeeper.bookie.LedgerCacheImpl.putEntryOffset(LedgerCacheImpl.java:103)
    at 
org.apache.bookkeeper.bookie.InterleavedLedgerStorage.processEntry(InterleavedLedgerStorage.java:530)
    at 
org.apache.bookkeeper.bookie.InterleavedLedgerStorage.processEntry(InterleavedLedgerStorage.java:512)
    at 
org.apache.bookkeeper.bookie.InterleavedLedgerStorage.addEntry(InterleavedLedgerStorage.java:366)
    at 
org.apache.bookkeeper.bookie.LedgerDescriptorImpl.addEntry(LedgerDescriptorImpl.java:153)
    at org.apache.bookkeeper.bookie.Bookie$4.process(Bookie.java:888)
    at org.apache.bookkeeper.bookie.Journal.scanJournal(Journal.java:821)
    at org.apache.bookkeeper.bookie.Journal.replay(Journal.java:866)
    at org.apache.bookkeeper.bookie.Bookie.readJournal(Bookie.java:901)
    at org.apache.bookkeeper.bookie.Bookie.start(Bookie.java:922)
    at org.apache.bookkeeper.proto.BookieServer.start(BookieServer.java:141)
    at 
org.apache.bookkeeper.server.service.BookieService.doStart(BookieService.java:58)
    at 
org.apache.bookkeeper.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:78)
    at 
org.apache.bookkeeper.common.component.LifecycleComponentStack.lambda$start$2(LifecycleComponentStack.java:113)
    at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:408)
    at 
org.apache.bookkeeper.common.component.LifecycleComponentStack.start(LifecycleComponentStack.java:113)
    at 
org.apache.bookkeeper.common.component.ComponentStarter.startComponent(ComponentStarter.java:80)
    at org.apache.bookkeeper.server.Main.doMain(Main.java:229)
    at org.apache.bookkeeper.server.Main.main(Main.java:203)
   2020-11-11 07:43:42,489 - INFO - [main:ZooKeeper@693] - Session: 
0x10007c8a7a803d8 closed
   2020-11-11 07:43:42,490 - INFO - 
[main-EventThread:ClientCnxn$EventThread@522] - EventThread shut down for 
session: 0x10007c8a7a803d8
   2020-11-11 07:43:42,546 - INFO - 
[vert.x-eventloop-thread-0:VertxHttpServer$2@79] - Starting Vertx HTTP server 
on port 8080
   ```
   After this, the Bookie process starts but it is unable to do any IO, 
including the readiness probes of the [Bookkeeper 
Operator](https://github.com/pravega/bookkeeper-operator). The questions 
regarding this problem are the following:
   - Is this `Missing ledger signature` expected in the presence when rebooting 
the node in which a Bookie runs? It looks as a form of data corruption/loss, 
but I would like to hear the confirmation from the Bookkeeper community about 
this.
   - Once in the presence of this error (o similar ones), what is the best 
course of action? Should we decommission the Bookie and create a new one? I 
would like to hear what is the best approach to handle this situation.
   
   Thanks in advance.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to