[ https://issues.apache.org/jira/browse/HBASE-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833198#action_12833198 ]
Andrew Purtell commented on HBASE-2223: --------------------------------------- {quote} bq. Or just restart replication after the slave is back online and interleave edits from the queue with new ones as necessary? [...] this could cause application problems if they need to assume a single arrow of time of edits, and not wanting to see a partial world view {quote} Ok, makes sense for the first cut. Especially if replication logic is pluggable and subclassable. Applications can plug in their own policies to do what makes the most sense for them. > Handle 10min+ network partitions between clusters > ------------------------------------------------- > > Key: HBASE-2223 > URL: https://issues.apache.org/jira/browse/HBASE-2223 > Project: Hadoop HBase > Issue Type: Sub-task > Reporter: Jean-Daniel Cryans > Assignee: Jean-Daniel Cryans > Fix For: 0.21.0 > > > We need a nice way of handling long network partitions without impacting a > master cluster (which pushes the data). Currently it will just retry over and > over again. > I think we could: > - Stop replication to a slave cluster if it didn't respond for more than 10 > minutes > - Keep track of the duration of the partition > - When the slave cluster comes back, initiate a MR job like HBASE-2221 > Maybe we want less than 10 minutes, maybe we want this to be all automatic or > just the first 2 parts. Discuss. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.