[ 
https://issues.apache.org/jira/browse/RATIS-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914658#comment-16914658
 ] 

Ankit Singhal commented on RATIS-556:
-------------------------------------

Thanks [~rajeshbabu] for working on it.

bq. If we just create an inverted index then a log will be served by 3 peers 
and when any of the peers goes down we need to close the log. But if we take 
HBase use case a log will be created by a server and from the logname, we can 
detect the server but such functionality cannot be done in this generic log 
service. When the log replicated until unless the main server created the log 
wont be recovered. 
bq. We can handle such use case at least by [passing the peer and we can close 
the log only when the peer goes down.

Oh, I see where the confusion is, I think we agreed that if peer/s of the 
quorum goes down and quorum size of a log is reduced to 2(or unsafe no. , could 
be configurable), we should automatically close those logs and start the 
archival process. so that we don't keep the unsafe quorum/logs lingering for 
long.

In case of HBase, if the regionserver goes down, fencing will be taken care by 
the master, like he will be responsible for getting and  closing the WAL 
currently written by a regionserver (in case Ratis have not closed it if the 
peers of the WAL are on other servers which are alive), though regionserver 
still needs to ensure that it will not create a new log if he is deemed dead.



> Detect node failures and close the log to prevent additional writes
> -------------------------------------------------------------------
>
>                 Key: RATIS-556
>                 URL: https://issues.apache.org/jira/browse/RATIS-556
>             Project: Ratis
>          Issue Type: Improvement
>            Reporter: Rajeshbabu Chintaguntla
>            Assignee: Rajeshbabu Chintaguntla
>            Priority: Major
>         Attachments: RATIS-556-wip.patch, RATIS-556_v1.patch
>
>
> Currently there is no way to detect the node failures at master log servers 
> and add new nodes to the group serving the log. We need to analyze how Ozone 
> is working in this case.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to