[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529398#comment-13529398
 ] 

Sijie Guo commented on BOOKKEEPER-365:
--------------------------------------

Rethinking a bit of BOOKKEEPER-365 and BOOKKEEPER-355. They are quite 
irrelative.

NoSuchEntry and NoSuchLedger are treated the termination condition for 
LedgerRecovery. So it is import that how to decide it was 
NoSuchEntry/NoSuchLedger and not be mislead.

Currently we used a kind of read-all-write-quorums to ensure it was actually 
NoSuchLedger and NoSuchEntry. But it wasn't necessary, we just need check at 
least (write_quorum_size - ack_quorum_size + 1) bookies regarding entries 
missing case. So when the write_quorum_size and ack_quorum_size is same, we 
just need 1 bookie to confirm NoSuchLedger and NoSuchEntry (it is what we did 
currently). But it should be fixed since we already separated ack_quorum_size 
from write_quorum_size. This jira tends to fix it.

Go back to entries/ledgers missing case. How would missing case happen?

1) a ledger disk (ledger directory) doesn't mount correctly. All the ledgers in 
that ledger disk would be lost and respond NoSuchLedger.
2) an entry is corrupted. 
3) A brand new bookie is replaced when changing ensemble.
4) a ledger index file is removed or truncated by mistake.
5) some other bugs causing entries missing.

1) is already addressed by Cookie. 

for 2), bookie should respond ReadException rather than NoSuchLedger and 
NoSuchEntry. so it would be treated as a valid read instead of NoSuchLedger and 
NoSuchEntry. so it would not affect how we decide if it was NoSuchEntry.

for 3), it was actually the root cause of BOOKKEEPER-355. A brand new bookie is 
introduced in last ensemble which affect how we decide if it was 
NoSuchEntry/NoSuchLedger. Lets take an example from BOOKKEEPER-355.

Suppose there are 3 bookie A, B, C. ensemble size = 2, write quorum size = 2, 
ack quorum size = 1.

1) Ledger L is created with A, B.
2) Add entry 0. entry 0 is written to A. but failed to add to B.
3) A client came in and fence ledger L.
4) client reads entry 0 and try to add this entry back to A, B again.
5) network partition happened, client could not connect to A.
6) C is picked up to replace A.
7) B, C are not connected. client failed the recovery
8) another client tried to recover again.
9) it connected to B, C. they respond NoSuchLedger. and the ledger is closed 
with zero entries. and entry 0 is lost.

the problem is that we replaced C with A before writing recovered entries, 
which caused the info in A are lost, which would cause wrong decision during 
ledger recover. even worse, if the whole ensemble is replaced totally, we lost 
all the info to recover a ledger. it would be pretty bad.

so fixing BOOKKEEPER-355 doesn't depend on how we read from the quorum, but how 
we could avoid such replacement by mistake.

for 4), I think it should not be addressed by the project itself, shouldn't it?

for 5), I have to say we need to avoid such bugs at the best.

I think for BOOKKEEPER-355, we have to fix it in 4.2.0. otherwise, it looks 
bad. for BOOKKEEPER-365, we need to add some code to cover the cases we 
separating write_quorum_size and ack_quorum_size. so it tends to be in 4.2.0 
too.
                
> Ledger will never recover if one of the quorum bookie is down forever and 
> others dont have entry
> ------------------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-365
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-365
>             Project: Bookkeeper
>          Issue Type: Bug
>    Affects Versions: 4.0.0, 4.1.0
>            Reporter: Sijie Guo
>            Assignee: Yixue (Andrew) Zhu
>             Fix For: 4.2.0
>
>
> As discussed in BOOKKEEPER-355, current fix to handle the below issue is not 
> correct. Need to find out new solution
> If some bookies of a quorum gone forever, some bookies of this quorum are 
> still alive but doesn't have that entry (NoSuchEntry or NoSuchLedger), then 
> the ledger doesn't have any evidence to recovery/close it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to