[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973395#action_12973395
 ] 

Flavio Junqueira commented on ZOOKEEPER-465:
--------------------------------------------

Hi Dhruba, When I wrote the description, I think I was referring to writing to 
ZooKeeper once we close a ledger, so we wouldn't have to pay the price of a 
ZooKeeper update upon each addEntry. However, thinking again about the problem, 
this approach is not fault tolerant. If the client writer crashes before 
closing and the byte count is volatile, then we will lose it. 

One way I see to overcome this problem is having each bookie keep the byte 
count for its ledger fragment. Given the byte count B for a ledger fragment, we 
can obtain an estimate of the total number by computing (B * n/r), where n is 
the number of bookies storing the ledger and r is the replication factor of 
each entry. This last formula comes from the observation that each bookie 
stores r/n entries of a ledger. 

This approach, however, does not provide a good estimate if the length of 
entries varies significantly. A less efficient approach that doesn't have the 
imbalance problem is reading the byte counts from all bookies, adding them up, 
and dividing by the replication factor. This operation will only complete if no 
bookie is faulty. In the case we have a faulty bookie, we have a procedure to 
recover the ledger fragments of a faulty bookie.

Assuming that there are bookies that have crashed and their fragments haven't 
been replicated to new bookies, the best I can think of at this point is taking 
the average over the bookies that are up and performing the same computation 
above.

Any other option I'm missing? 

> Ledger size in bytes
> --------------------
>
>                 Key: ZOOKEEPER-465
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-465
>             Project: ZooKeeper
>          Issue Type: New Feature
>          Components: contrib-bookkeeper
>            Reporter: Flavio Junqueira
>
> It is currently easy to know how many entries a ledger has, but there is no 
> easy way to know the total number of bytes in a ledger. The idea of this jira 
> is to add a method that gives the number of bytes in a closed ledger. My 
> current idea is to simply have the writer counting the number of bytes 
> written and store it to ZooKeeper.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to