Bookie stops responding when creating and deleting many ledgers
---------------------------------------------------------------

                 Key: BOOKKEEPER-212
                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-212
             Project: Bookkeeper
          Issue Type: Bug
            Reporter: Flavio Junqueira
            Priority: Blocker


I have written down a short app to try to reproduce one problematic case 
reported on the user list. The app does the following:

# It creates initially a number of ledgers, say 2000;
# Once it reaches 2000, for each new ledger it creates, it deletes the one at 
the head of the list;
# Before closing the ledger, it adds 5 entries of 1k, just to generate some 
traffic for any given ledger.

What I tried to achieve is to have thousands of active ledgers and delete new 
ledgers as I create new ones. I'll post a link to my test code later. 

At some point, one bookie stops responding. The bookie seems to be up, but it 
is not responsive. Looking at the logs, this is what I see:

{noformat}
2012-04-06 12:22:05,765 - INFO  [SyncThread:LedgerCacheImpl@682] - Ledger 1726 
is evicted from file info cache.
2012-04-06 12:22:05,769 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
Ledger 1727 is evicted from file info cache.
2012-04-06 12:22:05,772 - INFO  [SyncThread:LedgerCacheImpl@682] - Ledger 1728 
is evicted from file info cache.
2012-04-06 12:22:05,780 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
Ledger 1729 is evicted from file info cache.
2012-04-06 12:22:05,787 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
Ledger 1730 is evicted from file info cache.
2012-04-06 12:22:05,794 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
Ledger 1731 is evicted from file info cache.
2012-04-06 12:22:05,801 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
Ledger 1732 is evicted from file info cache.
2012-04-06 12:22:05,807 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
Ledger 1733 is evicted from file info cache.
2012-04-06 12:22:05,822 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
Ledger 1734 is evicted from file info cache.
2012-04-06 12:22:05,828 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
Ledger 1735 is evicted from file info cache.
2012-04-06 12:22:05,842 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
Ledger 1736 is evicted from file info cache.
2012-04-06 12:22:05,851 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
Ledger 1737 is evicted from file info cache.
2012-04-06 12:22:05,856 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
Ledger 1738 is evicted from file info cache.
2012-04-06 12:22:05,864 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
Ledger 1739 is evicted from file info cache.
2012-04-06 12:22:05,874 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
Ledger 1740 is evicted from file info cache.
2012-04-06 12:22:05,885 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
Ledger 1741 is evicted from file info cache.
2012-04-06 12:22:05,894 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
Ledger 1742 is evicted from file info cache.
2012-04-06 12:22:05,902 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
Ledger 1743 is evicted from file info cache.
2012-04-06 12:22:05,987 - INFO  [GarbageCollectorThread:LedgerCacheImpl@682] - 
Ledger 1744 is evicted from file info cache.
2012-04-06 12:22:05,987 - ERROR 
[GarbageCollectorThread:GarbageCollectorThread$1@244] - Exception when deleting 
the ledger index file on the Bookie: 
java.io.IOException: /home/fpj/bk/current/1/b/10b.idx not found
        at org.apache.bookkeeper.bookie.FileInfo.checkOpen(FileInfo.java:118)
        at org.apache.bookkeeper.bookie.FileInfo.close(FileInfo.java:194) 
        at 
org.apache.bookkeeper.bookie.LedgerCacheImpl.deleteLedger(LedgerCacheImpl.java:619)
 
        at 
org.apache.bookkeeper.bookie.GarbageCollectorThread$1.gc(GarbageCollectorThread.java:242)
        at 
org.apache.bookkeeper.meta.AbstractZkLedgerManager.doGc(AbstractZkLedgerManager.java:274)
        at 
org.apache.bookkeeper.meta.FlatLedgerManager.garbageCollectLedgers(FlatLedgerManager.java:168)
        at 
org.apache.bookkeeper.bookie.GarbageCollectorThread.doGcLedgers(GarbageCollectorThread.java:237)
        at 
org.apache.bookkeeper.bookie.GarbageCollectorThread.run(GarbageCollectorThread.java:206)
 
2012-04-06 12:22:05,987 - INFO  [GarbageCollectorThread:LedgerCacheImpl@682] - 
Ledger 1745 is evicted from file info cache.
{noformat}

There are lots of exceptions like that, but otherwise I don't see anything that 
could make the bookie unresponsive. I'll upload the logs as well.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to