[
https://issues.apache.org/jira/browse/HBASE-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452496#comment-13452496
]
stack commented on HBASE-6752:
------------------------------
Makes sense. Sounds great. How we know what regionserver to give a log split
too when the log has edits for all regions that were on a regionserver. You
thinking we could give all regions on the crashed regionserver to a particular
regionserver?
> On region server failure, serve writes and timeranged reads during the log
> split
> --------------------------------------------------------------------------------
>
> Key: HBASE-6752
> URL: https://issues.apache.org/jira/browse/HBASE-6752
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Affects Versions: 0.96.0
> Reporter: nkeywal
> Priority: Minor
>
> Opening for write on failure would mean:
> - Assign the region to a new regionserver. It marks the region as recovering
> -- specific exception returned to the client when we cannot server.
> -- allow them to know where they stand. The exception can include some time
> information (failure stated on: ...)
> -- allow them to go immediately on the right regionserver, instead of
> retrying or calling the region holding meta to get the new address
> => save network calls, lower the load on meta.
> - Do the split as today. Priority is given to region server holding the new
> regions
> -- help to share the load balancing code: the split is done by region
> server considered as available for new regions
> -- help locality (the recovered edits are available on the region server)
> => lower the network usage
> - When the split is finished, we're done as of today
> - while the split is progressing, the region server can
> -- serve writes
> --- that's useful for all application that need to write but not read
> immediately:
> --- whatever logs events to analyze them later
> --- opentsdb is a perfect example.
> -- serve reads if they have a compatible time range. For heavily used
> tables, it could be an help, because:
> --- we can expect to have a few minutes of data only (as it's loaded)
> --- the heaviest queries, often accepts a few -or more- minutes delay.
> Some "What if":
> 1) the split fails
> => Retry until it works. As today. Just that we serves writes. We need to
> know (as today) that the region has not recovered if we fail again.
> 2) the regionserver fails during the split
> => As 1 and as of today/
> 3) the regionserver fails after the split but before the state change to
> fully available.
> => New assign. More logs to split (the ones already dones and the new ones).
> 4) the assignment fails
> => Retry until it works. As today.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira