[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249680#comment-13249680
 ] 

Sijie Guo commented on BOOKKEEPER-212:
--------------------------------------

@Flavio

checkOpen is introduced in BOOKKEEPER-137. The case that masterKey is null and 
the index file is not existed, which means we don't have the master key to 
create index file, so we have to throw an exception. 

the exception you found is a weird case, may be introduced by BOOKKEEPER-190, 
where we forced the index creation when a ledger is evicted from ledger cache.

from the exception stack, it seems that the exception is thrown when deleting 
ledger 3. but ledger 3 was evicted from ledger cache before, and its ledger 
index file should be forced to create when the eviction happened. it is so 
strange why the index file didn't exist. 
                
> Bookie stops responding when creating and deleting many ledgers
> ---------------------------------------------------------------
>
>                 Key: BOOKKEEPER-212
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-212
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Flavio Junqueira
>            Priority: Blocker
>             Fix For: 4.1.0
>
>         Attachments: bookkeeper-server.log
>
>
> I have written down a short app to try to reproduce one problematic case 
> reported on the user list. The app does the following:
> # It creates initially a number of ledgers, say 2000;
> # Once it reaches 2000, for each new ledger it creates, it deletes the one at 
> the head of the list;
> # Before closing the ledger, it adds 5 entries of 1k, just to generate some 
> traffic for any given ledger.
> What I tried to achieve is to have thousands of active ledgers and delete new 
> ledgers as I create new ones. I'll post a link to my test code later. 
> At some point, one bookie stops responding. The bookie seems to be up, but it 
> is not responsive. Looking at the logs, this is what I see:
> {noformat}
> 2012-04-06 12:22:05,765 - INFO  [SyncThread:LedgerCacheImpl@682] - Ledger 
> 1726 is evicted from file info cache.
> 2012-04-06 12:22:05,769 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1727 is evicted from file info cache.
> 2012-04-06 12:22:05,772 - INFO  [SyncThread:LedgerCacheImpl@682] - Ledger 
> 1728 is evicted from file info cache.
> 2012-04-06 12:22:05,780 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1729 is evicted from file info cache.
> 2012-04-06 12:22:05,787 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1730 is evicted from file info cache.
> 2012-04-06 12:22:05,794 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1731 is evicted from file info cache.
> 2012-04-06 12:22:05,801 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1732 is evicted from file info cache.
> 2012-04-06 12:22:05,807 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1733 is evicted from file info cache.
> 2012-04-06 12:22:05,822 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1734 is evicted from file info cache.
> 2012-04-06 12:22:05,828 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1735 is evicted from file info cache.
> 2012-04-06 12:22:05,842 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1736 is evicted from file info cache.
> 2012-04-06 12:22:05,851 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1737 is evicted from file info cache.
> 2012-04-06 12:22:05,856 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1738 is evicted from file info cache.
> 2012-04-06 12:22:05,864 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1739 is evicted from file info cache.
> 2012-04-06 12:22:05,874 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1740 is evicted from file info cache.
> 2012-04-06 12:22:05,885 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1741 is evicted from file info cache.
> 2012-04-06 12:22:05,894 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1742 is evicted from file info cache.
> 2012-04-06 12:22:05,902 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1743 is evicted from file info cache.
> 2012-04-06 12:22:05,987 - INFO  [GarbageCollectorThread:LedgerCacheImpl@682] 
> - Ledger 1744 is evicted from file info cache.
> 2012-04-06 12:22:05,987 - ERROR 
> [GarbageCollectorThread:GarbageCollectorThread$1@244] - Exception when 
> deleting the ledger index file on the Bookie: 
> java.io.IOException: /home/fpj/bk/current/1/b/10b.idx not found
>         at org.apache.bookkeeper.bookie.FileInfo.checkOpen(FileInfo.java:118)
>         at org.apache.bookkeeper.bookie.FileInfo.close(FileInfo.java:194) 
>         at 
> org.apache.bookkeeper.bookie.LedgerCacheImpl.deleteLedger(LedgerCacheImpl.java:619)
>  
>         at 
> org.apache.bookkeeper.bookie.GarbageCollectorThread$1.gc(GarbageCollectorThread.java:242)
>         at 
> org.apache.bookkeeper.meta.AbstractZkLedgerManager.doGc(AbstractZkLedgerManager.java:274)
>         at 
> org.apache.bookkeeper.meta.FlatLedgerManager.garbageCollectLedgers(FlatLedgerManager.java:168)
>         at 
> org.apache.bookkeeper.bookie.GarbageCollectorThread.doGcLedgers(GarbageCollectorThread.java:237)
>         at 
> org.apache.bookkeeper.bookie.GarbageCollectorThread.run(GarbageCollectorThread.java:206)
>  
> 2012-04-06 12:22:05,987 - INFO  [GarbageCollectorThread:LedgerCacheImpl@682] 
> - Ledger 1745 is evicted from file info cache.
> {noformat}
> There are lots of exceptions like that, but otherwise I don't see anything 
> that could make the bookie unresponsive. I'll upload the logs as well.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to