[
https://issues.apache.org/jira/browse/HBASE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732334#action_12732334
]
stack commented on HBASE-1609:
------------------------------
Testing puts, I see the below in client when we shut down in middle of an
upload:
{code}
2009-07-17 04:10:04,645 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Reloading
region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498
location because regionserver didn't accept updates; tries=0 of max=10,
waiting=2000ms
2009-07-17 04:10:06,904 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Reloading
region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498
location because regionserver didn't accept updates; tries=1 of max=10,
waiting=2000ms
2009-07-17 04:10:09,015 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Reloading
region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498
location because regionserver didn't accept updates; tries=2 of max=10,
waiting=2000ms
2009-07-17 04:10:11,068 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Reloading
region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498
location because regionserver didn't accept updates; tries=3 of max=10,
waiting=4000ms
2009-07-17 04:10:15,107 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Reloading
region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498
location because regionserver didn't accept updates; tries=4 of max=10,
waiting=4000ms
2009-07-17 04:10:19,216 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Reloading
region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498
location because regionserver didn't accept updates; tries=5 of max=10,
waiting=8000ms
2009-07-17 04:10:27,490 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Reloading
region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498
location because regionserver didn't accept updates; tries=6 of max=10,
waiting=8000ms
2009-07-17 04:10:35,534 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Reloading
region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498
location because regionserver didn't accept updates; tries=7 of max=10,
waiting=16000ms
2009-07-17 04:10:52,446 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Reloading
region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498
location because regionserver didn't accept updates; tries=8 of max=10,
waiting=32000ms
2009-07-17 04:11:24,514 WARN org.apache.hadoop.mapred.TaskTracker: Error
running child
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
region server Some server, retryOnlyOne=true, index=0, islastrow=false,
tries=9, numtries=10, i=0, listsize=8643, location=address: X.X.X.141:60020,
regioninfo: REGION => {NAME =>
'TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498', STARTKEY =>
'\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03', ENDKEY =>
'\x00\x03\x02\x04\x06\x07\x07\x00\x06\x06', ENCODED => 1615573, TABLE => {{NAME
=> 'TestTable', FAMILIES => [{NAME => 'info', COMPRESSION => 'NONE', VERSIONS
=> '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
BLOCKCACHE => 'true'}]}},
region=TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498 for
region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498, row
'\x00\x03\x02\x04\x06\x03\x09\x01\x02\x06', but failed after 10 attempts.
{code}
I think I see connection refused too.
That ain't bad I'd say.
This is with zk not managed by hbase. If I shut down a cluster where hbase is
managing the zk quorum -- i.e. its shutdown as part of hbase shutdown -- then I
see client log filled with zk complaints with above intermixed.
Scanning, I see EOFException because server went down returning result it looks
like.
Exceptions ain't pretty but I don't see anything inherently wrong. Will go
ahead and commit.
With this new commit, our new philosophy is no more trying to be mr. nice guy
regards clients if admin wants cluster to go down.
> [part of hbase-1583] We wait on leases to expire before regionserver goes
> down. Rather, just let client fail
> -------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-1609
> URL: https://issues.apache.org/jira/browse/HBASE-1609
> Project: Hadoop HBase
> Issue Type: Improvement
> Reporter: stack
> Assignee: stack
> Fix For: 0.20.0
>
> Attachments: 1609-v2.patch, 1609.patch
>
>
> Addressing this issue will help hbase-1583. We should do for 0.20.0 and
> perhaps for 0.19.x even.
> Currently, if outstanding leases, in HRegion close, we'll hang until lease
> expires. Could be a minute. Could be worse, the client might come in and
> renew the lease a few times at least till it finishes out the region. This
> gets in way of regionserver shutting down fast.
> J-D suggests that regionserver should just go down and outstanding clients
> should fail rather than try and be nice to outstanding clients (in his case,
> his MR job had failed so no clients... but we insist on lease expiring).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.