[jira] Commented: (HBASE-1609) [part of hbase-1583] We wait on leases to expire before regionserver goes down. Rather, just let client fail

stack (JIRA) Thu, 16 Jul 2009 21:27:42 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732334#action_12732334
 ]


stack commented on HBASE-1609:
------------------------------

Testing puts, I see the below in client when we shut down in middle of an 
upload:

{code}
2009-07-17 04:10:04,645 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Reloading 
region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498 
location because regionserver didn't accept updates; tries=0 of max=10, 
waiting=2000ms
2009-07-17 04:10:06,904 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Reloading 
region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498 
location because regionserver didn't accept updates; tries=1 of max=10, 
waiting=2000ms
2009-07-17 04:10:09,015 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Reloading 
region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498 
location because regionserver didn't accept updates; tries=2 of max=10, 
waiting=2000ms
2009-07-17 04:10:11,068 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Reloading 
region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498 
location because regionserver didn't accept updates; tries=3 of max=10, 
waiting=4000ms
2009-07-17 04:10:15,107 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Reloading 
region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498 
location because regionserver didn't accept updates; tries=4 of max=10, 
waiting=4000ms
2009-07-17 04:10:19,216 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Reloading 
region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498 
location because regionserver didn't accept updates; tries=5 of max=10, 
waiting=8000ms
2009-07-17 04:10:27,490 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Reloading 
region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498 
location because regionserver didn't accept updates; tries=6 of max=10, 
waiting=8000ms
2009-07-17 04:10:35,534 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Reloading 
region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498 
location because regionserver didn't accept updates; tries=7 of max=10, 
waiting=16000ms
2009-07-17 04:10:52,446 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Reloading 
region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498 
location because regionserver didn't accept updates; tries=8 of max=10, 
waiting=32000ms
2009-07-17 04:11:24,514 WARN org.apache.hadoop.mapred.TaskTracker: Error 
running child
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact 
region server Some server, retryOnlyOne=true, index=0, islastrow=false, 
tries=9, numtries=10, i=0, listsize=8643, location=address: X.X.X.141:60020, 
regioninfo: REGION => {NAME => 
'TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498', STARTKEY => 
'\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03', ENDKEY => 
'\x00\x03\x02\x04\x06\x07\x07\x00\x06\x06', ENCODED => 1615573, TABLE => {{NAME 
=> 'TestTable', FAMILIES => [{NAME => 'info', COMPRESSION => 'NONE', VERSIONS 
=> '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', 
BLOCKCACHE => 'true'}]}}, 
region=TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498 for 
region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498, row 
'\x00\x03\x02\x04\x06\x03\x09\x01\x02\x06', but failed after 10 attempts.
{code}

I think I see connection refused too.

That ain't bad I'd say.

This is with zk not managed by hbase.  If I shut down a cluster where hbase is 
managing the zk quorum -- i.e. its shutdown as part of hbase shutdown -- then I 
see client log filled with zk complaints with above intermixed.

Scanning, I see EOFException because server went down returning result it looks 
like.

Exceptions ain't pretty but I don't see anything inherently wrong.  Will go 
ahead and commit.

With this new commit,  our new philosophy is no more trying to be mr. nice guy 
regards clients if admin wants cluster to go down.

> [part of hbase-1583] We wait on leases to expire before regionserver goes 
> down.  Rather, just let client fail
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1609
>                 URL: https://issues.apache.org/jira/browse/HBASE-1609
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.20.0
>
>         Attachments: 1609-v2.patch, 1609.patch
>
>
> Addressing this issue will help hbase-1583.  We should do for 0.20.0 and 
> perhaps for 0.19.x even.
> Currently, if outstanding leases, in HRegion close, we'll hang until lease 
> expires.  Could be a minute.  Could be worse, the client might come in and 
> renew the lease a few times at least till it finishes out the region.  This 
> gets in way of regionserver shutting down fast.  
> J-D suggests that regionserver should just go down and outstanding clients 
> should fail rather than try and be nice to outstanding clients (in his case, 
> his MR job had failed so no clients... but we insist on lease expiring).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1609) [part of hbase-1583] We wait on leases to expire before regionserver goes down. Rather, just let client fail

Reply via email to