[ 
https://issues.apache.org/jira/browse/HBASE-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gregory Chanan reassigned HBASE-6752:
-------------------------------------

    Assignee:     (was: Gregory Chanan)
    
> On region server failure, serve writes and timeranged reads during the log 
> split
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-6752
>                 URL: https://issues.apache.org/jira/browse/HBASE-6752
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Priority: Minor
>
> Opening for write on failure would mean:
> - Assign the region to a new regionserver. It marks the region as recovering
>   -- specific exception returned to the client when we cannot server.
>   -- allow them to know where they stand. The exception can include some time 
> information (failure stated on: ...)
>   -- allow them to go immediately on the right regionserver, instead of 
> retrying or calling the region holding meta to get the new address
>      => save network calls, lower the load on meta.
> - Do the split as today. Priority is given to region server holding the new 
> regions
>   -- help to share the load balancing code: the split is done by region 
> server considered as available for new regions
>   -- help locality (the recovered edits are available on the region server) 
> => lower the network usage
> - When the split is finished, we're done as of today
> - while the split is progressing, the region server can
>  -- serve writes
>    --- that's useful for all application that need to write but not read 
> immediately:
>    --- whatever logs events to analyze them later
>    --- opentsdb is a perfect example.   
>  -- serve reads if they have a compatible time range. For heavily used 
> tables, it could be an help, because:
>    --- we can expect to have a few minutes of data only (as it's loaded)
>    --- the heaviest queries, often accepts a few -or more- minutes delay. 
> Some "What if":
> 1) the split fails
> => Retry until it works. As today. Just that we serves writes. We need to 
> know (as today) that the region has not recovered if we fail again.
> 2) the regionserver fails during the split
> => As 1 and as of today/
> 3) the regionserver fails after the split but before the state change to 
> fully available.
> => New assign. More logs to split (the ones already dones and the new ones).
> 4) the assignment fails
> => Retry until it works. As today.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to