[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated BOOKKEEPER-212:
----------------------------------------

    Attachment: bookkeeper-server.log

I'm attaching the log file of the bookie that became unresponsive. In this run, 
the app stopped with these error messages:

{noformat}
2012-04-06 12:22:07,565 - ERROR - [pool-3-thread-1:PerChannelBookieClient@550] 
- Add for ledger: 2382, entry: 0 failed on bookie: /IP2:3181 with code: 101
2012-04-06 12:22:07,566 - WARN  - [pool-3-thread-1:PendingAddOp@146] - Write 
did not succeed: 2382, 0
2012-04-06 12:22:07,572 - ERROR - [pool-3-thread-1:LedgerHandle@659] - Could 
not get additional bookie to remake ensemble, closing ledger: 2382
org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException
        at org.apache.bookkeeper.client.BKException.create(BKException.java:58)
        at 
org.apache.bookkeeper.client.LedgerHandle.addEntry(LedgerHandle.java:410)
        at 
org.apache.bookkeeper.client.LedgerHandle.addEntry(LedgerHandle.java:387)
        at org.apache.bookkeeper.loader.LedgerLoader.main(LedgerLoader.java:38)
2012-04-06 12:22:12,588 - INFO  - [Hashed wheel timer 
#1:PerChannelBookieClient@447] - Disconnected from bookie: /IP2:3181
2012-04-06 12:22:12,600 - INFO  - [Hashed wheel timer 
#2:PerChannelBookieClient@447] - Disconnected from bookie: /IP1:3181
2012-04-06 12:22:12,627 - INFO  - [Hashed wheel timer 
#3:PerChannelBookieClient@447] - Disconnected from bookie: /IP3:3181
{noformat}
                
> Bookie stops responding when creating and deleting many ledgers
> ---------------------------------------------------------------
>
>                 Key: BOOKKEEPER-212
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-212
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Flavio Junqueira
>            Priority: Blocker
>             Fix For: 4.1.0
>
>         Attachments: bookkeeper-server.log
>
>
> I have written down a short app to try to reproduce one problematic case 
> reported on the user list. The app does the following:
> # It creates initially a number of ledgers, say 2000;
> # Once it reaches 2000, for each new ledger it creates, it deletes the one at 
> the head of the list;
> # Before closing the ledger, it adds 5 entries of 1k, just to generate some 
> traffic for any given ledger.
> What I tried to achieve is to have thousands of active ledgers and delete new 
> ledgers as I create new ones. I'll post a link to my test code later. 
> At some point, one bookie stops responding. The bookie seems to be up, but it 
> is not responsive. Looking at the logs, this is what I see:
> {noformat}
> 2012-04-06 12:22:05,765 - INFO  [SyncThread:LedgerCacheImpl@682] - Ledger 
> 1726 is evicted from file info cache.
> 2012-04-06 12:22:05,769 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1727 is evicted from file info cache.
> 2012-04-06 12:22:05,772 - INFO  [SyncThread:LedgerCacheImpl@682] - Ledger 
> 1728 is evicted from file info cache.
> 2012-04-06 12:22:05,780 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1729 is evicted from file info cache.
> 2012-04-06 12:22:05,787 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1730 is evicted from file info cache.
> 2012-04-06 12:22:05,794 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1731 is evicted from file info cache.
> 2012-04-06 12:22:05,801 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1732 is evicted from file info cache.
> 2012-04-06 12:22:05,807 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1733 is evicted from file info cache.
> 2012-04-06 12:22:05,822 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1734 is evicted from file info cache.
> 2012-04-06 12:22:05,828 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1735 is evicted from file info cache.
> 2012-04-06 12:22:05,842 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1736 is evicted from file info cache.
> 2012-04-06 12:22:05,851 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1737 is evicted from file info cache.
> 2012-04-06 12:22:05,856 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1738 is evicted from file info cache.
> 2012-04-06 12:22:05,864 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1739 is evicted from file info cache.
> 2012-04-06 12:22:05,874 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1740 is evicted from file info cache.
> 2012-04-06 12:22:05,885 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1741 is evicted from file info cache.
> 2012-04-06 12:22:05,894 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1742 is evicted from file info cache.
> 2012-04-06 12:22:05,902 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - 
> Ledger 1743 is evicted from file info cache.
> 2012-04-06 12:22:05,987 - INFO  [GarbageCollectorThread:LedgerCacheImpl@682] 
> - Ledger 1744 is evicted from file info cache.
> 2012-04-06 12:22:05,987 - ERROR 
> [GarbageCollectorThread:GarbageCollectorThread$1@244] - Exception when 
> deleting the ledger index file on the Bookie: 
> java.io.IOException: /home/fpj/bk/current/1/b/10b.idx not found
>         at org.apache.bookkeeper.bookie.FileInfo.checkOpen(FileInfo.java:118)
>         at org.apache.bookkeeper.bookie.FileInfo.close(FileInfo.java:194) 
>         at 
> org.apache.bookkeeper.bookie.LedgerCacheImpl.deleteLedger(LedgerCacheImpl.java:619)
>  
>         at 
> org.apache.bookkeeper.bookie.GarbageCollectorThread$1.gc(GarbageCollectorThread.java:242)
>         at 
> org.apache.bookkeeper.meta.AbstractZkLedgerManager.doGc(AbstractZkLedgerManager.java:274)
>         at 
> org.apache.bookkeeper.meta.FlatLedgerManager.garbageCollectLedgers(FlatLedgerManager.java:168)
>         at 
> org.apache.bookkeeper.bookie.GarbageCollectorThread.doGcLedgers(GarbageCollectorThread.java:237)
>         at 
> org.apache.bookkeeper.bookie.GarbageCollectorThread.run(GarbageCollectorThread.java:206)
>  
> 2012-04-06 12:22:05,987 - INFO  [GarbageCollectorThread:LedgerCacheImpl@682] 
> - Ledger 1745 is evicted from file info cache.
> {noformat}
> There are lots of exceptions like that, but otherwise I don't see anything 
> that could make the bookie unresponsive. I'll upload the logs as well.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to