[
https://issues.apache.org/jira/browse/BOOKKEEPER-39?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112466#comment-13112466
]
Ivan Kelly commented on BOOKKEEPER-39:
--------------------------------------
Good code, I like how you've cleaned up GC, its much more readable now.
General comments.
Instead of system properties, we should use a proper configuration object.
(commons configuration).
For example, you have Constants.hashLedgerId in places, and
StringUtils.createHashNodeIfMissing. The usage of these would be much easier to
read if you had, conf.getBoolean("hash_ledger_ids"); and
conf.getBoolean("create_missing_hashnodes"), etc. I have thought of adding
proper configuration support to BK in the past. I think now is the time.
Why is createHashNodeIfMissing even needed? Shouldn't it always be the case.
For that matter, im wondering if we even need to preserve the old way of naming
ledgers, I don't think there's any overhead from using the hashing, and it will
scale better. It will also keep the code cleaner if we only use one. The only
problem really is that we would need an upgrade path, so that if people update
bookkeeper, the ledger format will be updated also.
This means we will also need to add a metadata version. I've started this in
some other work. I'll try to break out a patch soon.
Do we need to use hashing? The problem here is that ZK can't return more than a
certain number of ledgers. Can't we make it so that we split the ledger id into
smaller parts. So for ledger 0x1234567891234567 We could have a path
/ledgers/D12345/D67891/D23456. We could make the split smaller depending on
what ZooKeeper's max package size is.
Code comments.
L1192 is very verbose, you should use,
node2 = (node2+1) % NUM_HASH_DIRS;
L271
I think this method would fit better into one of the other classes, such as
LedgerUtils.
L450
The initial check here is redundant.
L490
Should just be Set, not compareAndSet.
Also, its best to keep lines under 120, the anything over that tends to wrap on
split screen.
L500
Why not make both get active ledgers methods use the same callback?
> Bookie server failed to restart because of too many ledgers (more than
> ~50,000 ledgers)
> ---------------------------------------------------------------------------------------
>
> Key: BOOKKEEPER-39
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-39
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-server
> Affects Versions: 3.4.0
> Reporter: Sijie Guo
> Assignee: Sijie Guo
> Attachments: bookkeeper-39.patch
>
>
> If we have ~500,000 topics in hedwig, we might have more than ~500,000
> ledgers in bookkeeper (a topic has more than 1 ledger). So when the bookie
> server restarted, a logfile GC thread is started, which will call
> zk.getChildren to fetch all ledgers, and it failed because of package length
> limitation.
> 2011-08-01 01:18:46,373 - ERROR
> [main-EventThread:EntryLogger$GarbageCollectorThread$1@164] - Error polling
> ZK for the available ledger nodes:
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
> = ConnectionLoss for /ledgers
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1519)
> at
> org.apache.bookkeeper.bookie.EntryLogger$GarbageCollectorThread$1.processResult(EntryLogger.java:162)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:592)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:481)
> 2011-08-01 01:18:46,373 - WARN [main-EventThread:Bookie$1@242] - ZK client
> has been disconnected to the ZK server!
> 2011-08-01 01:18:47,278 - WARN
> [main-SendThread(perf13.platform.mobile.sp2.yahoo.com:2181):ClientCnxn$SendThread@980]
> - Session 0x131833dec850034 for server
> perf13.platform.mobile.sp2.yahoo.com/98.139.43.86:2181, unexpected error,
> closing socket connection and attempting reconnect
> java.io.IOException: Packet len9976413 is out of range!
> at
> org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112)
> at
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:78)
> at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:264)
> at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:958)
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira