[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232011#comment-13232011
 ] 

Sijie Guo commented on BOOKKEEPER-126:
--------------------------------------

> Can we narrow down to cases where IOException occurs on flushing ledger 
> entries and bookie is still running. Only those entries would select as 
> under-replicated, 

I think flushing failure will not cause any entry under-replicated. (journal 
replay will recover it). The case we need consider is that entries before 
lastLogMark. If corruption happened on these entries, they are 
under-replicated. Your proposal-2 and proposal-3 could be used on 
detecting/re-replicating these entries.

The only side-effect of flushing failure is all following writes may fail, but 
the reads could still succeed, those flushed failed data are still buffered on 
EntryLogger, they could be read.

If we don't shut down the bookie server, it would be still in the available 
list. write requests still can be sent to this bookie, but they would fail, 
client would choose new ensemble to write, which increase the writes latency 
(as what we found in BOOKKEEPER-180).

for some IOExceptions such as 'No enough disk space', we should shutdown bookie 
server immediately to exclude it from available list. I am not sure is there 
any other recoverable io exception (means first time flush failed with an 
IOException, second time it succeed)? If not, I think we could shutdown bookie 
server when encountering IOException during flushing data.
                
> EntryLogger doesn't detect when one of it's logfiles is corrupt
> ---------------------------------------------------------------
>
>                 Key: BOOKKEEPER-126
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-126
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Ivan Kelly
>            Priority: Blocker
>             Fix For: 4.1.0
>
>
> If an entry log is corrupt, the bookie will ignore any entries past the 
> corruption. Quorum writes stops this being a problem at the moment, but we 
> should detect corruptions like this and rereplicate if necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to