[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-39?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104157#comment-13104157
 ] 

Sijie Guo commented on BOOKKEEPER-39:
-------------------------------------

2-level hash mechanism:

all the ledgers are organized into 2-level hash nodes, whose zk node path is 
like /hashed_ledgers/{hash1}/{hash2}/L0000000000.
1st level hash contains 256 nodes, starting from 0 to 0xff. Each 1st level hash 
node contains 256 sub-nodes, starting from 0 to 0xff. so we have 256 * 256 
hashed prefix nodes.

Create Ledger:
(1) Get the hashed node prefix.
    i) for first time, randomly select hash1 & hash2.
    ii) select next hashed prefix in round-robin way for next creations.
(2) Create a sequential node in the selected hashed prefix : 
/hashed_ledgers/{hash1}/{hash2}/L
    i) we can get the sequential node id after creation.
(3) ledger id is formed by ((long)((hash1 & 0xff) << 8) | (hash2 & 0xff)) | 
(nodeid & 0x000000ffffffffffL) << 16;
    i) putting the hash_part in lower bits which can avoid hash confliction, 
because :
        a) we will use ledger id as map key to store data in bookie, also use 
ledger id as submit key in OrderedExecutor.
        b) java HashMap use lower bits of hash key as hashCode.

Garbage Collection:

(1) do garbage collection one hash node by one hash node.
    i) get all the children of a specified hash node.
    ii) get all hosted ledgers of a specified hash node in bookie server.
        a) for ease, we store the reversed ledger id ({hash_part}{node_id}) in 
a sorted map : LedgerCache#activeLedgers
        b) call LedgerCache#activeLedgers.subMap("{hash_part}{00...0}", 
"{hash_part}{ff...f}") to retrieve all leger ids belong to a specified hash 
node.
    iii) do intersection of these two ledger id sets to find those non-active 
ledgers
    iv) delete those non-active ledgers
(2) After all the hash nodes are garbage collected, garbage collects the entry 
logs which don't contain any active ledger data.

> Bookie server failed to restart because of too many ledgers (more than 
> ~50,000 ledgers)
> ---------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-39
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-39
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>    Affects Versions: 3.4.0
>            Reporter: Sijie Guo
>         Attachments: bookkeeper-39.patch
>
>
> If we have ~500,000 topics in hedwig, we might have more than ~500,000 
> ledgers in bookkeeper (a topic has more than 1 ledger). So when the bookie 
> server restarted, a logfile GC thread is started, which will call 
> zk.getChildren to fetch all ledgers, and it failed because of package length 
> limitation.
> 2011-08-01 01:18:46,373 - ERROR 
> [main-EventThread:EntryLogger$GarbageCollectorThread$1@164] - Error polling 
> ZK for the available ledger nodes:
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /ledgers
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1519)
>         at 
> org.apache.bookkeeper.bookie.EntryLogger$GarbageCollectorThread$1.processResult(EntryLogger.java:162)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:592)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:481)
> 2011-08-01 01:18:46,373 - WARN  [main-EventThread:Bookie$1@242] - ZK client 
> has been disconnected to the ZK server!
> 2011-08-01 01:18:47,278 - WARN  
> [main-SendThread(perf13.platform.mobile.sp2.yahoo.com:2181):ClientCnxn$SendThread@980]
>  - Session 0x131833dec850034 for server 
> perf13.platform.mobile.sp2.yahoo.com/98.139.43.86:2181, unexpected error, 
> closing socket connection and attempting reconnect
> java.io.IOException: Packet len9976413 is out of range!
>         at 
> org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112)
>         at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:78)
>         at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:264)
>         at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:958) 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to