[
https://issues.apache.org/jira/browse/RATIS-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914658#comment-16914658
]
Ankit Singhal commented on RATIS-556:
-------------------------------------
Thanks [~rajeshbabu] for working on it.
bq. If we just create an inverted index then a log will be served by 3 peers
and when any of the peers goes down we need to close the log. But if we take
HBase use case a log will be created by a server and from the logname, we can
detect the server but such functionality cannot be done in this generic log
service. When the log replicated until unless the main server created the log
wont be recovered.
bq. We can handle such use case at least by [passing the peer and we can close
the log only when the peer goes down.
Oh, I see where the confusion is, I think we agreed that if peer/s of the
quorum goes down and quorum size of a log is reduced to 2(or unsafe no. , could
be configurable), we should automatically close those logs and start the
archival process. so that we don't keep the unsafe quorum/logs lingering for
long.
In case of HBase, if the regionserver goes down, fencing will be taken care by
the master, like he will be responsible for getting and closing the WAL
currently written by a regionserver (in case Ratis have not closed it if the
peers of the WAL are on other servers which are alive), though regionserver
still needs to ensure that it will not create a new log if he is deemed dead.
> Detect node failures and close the log to prevent additional writes
> -------------------------------------------------------------------
>
> Key: RATIS-556
> URL: https://issues.apache.org/jira/browse/RATIS-556
> Project: Ratis
> Issue Type: Improvement
> Reporter: Rajeshbabu Chintaguntla
> Assignee: Rajeshbabu Chintaguntla
> Priority: Major
> Attachments: RATIS-556-wip.patch, RATIS-556_v1.patch
>
>
> Currently there is no way to detect the node failures at master log servers
> and add new nodes to the group serving the log. We need to analyze how Ozone
> is working in this case.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)