[ 
https://issues.apache.org/jira/browse/HBASE-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751205#action_12751205
 ] 

stack commented on HBASE-1603:
------------------------------

Mikhail: I doubt very much its you.  Its a pity you don't have a timestamp for 
when you saw the RetriesExhaustedException issue.  It seems like you are having 
the issue originally described here and another, the missing .META. region.

I just saw this issue again myself with RC3.

First, the client-side up in the MR task.  The task fails because of 
TestTable,0254895141,1251965526935.  It says retries exhausted but it doesn't 
look like it was trying very had... we timed out too fast it would seem.

{code}
2009-09-03 09:04:05,509 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Removed 
.META.,,1 for tableName=.META. from cache because of 
TestTable,0256167115,99999999999999
2009-09-03 09:04:05,512 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cached location 
address: XX.XX.44.140:60020, regioninfo: REGION => {NAME => '.META.,,1', 
STARTKEY => '', ENDKEY => '', ENCODED => 1028785192, TABLE => {{NAME => 
'.META.', IS_META => 'true', MEMSTORE_FLUSHSIZE => '16384', FAMILIES => [{NAME 
=> 'historian', VERSIONS => '2147483647', COMPRESSION => 'NONE', TTL => 
'604800', BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, 
{NAME => 'info', VERSIONS => '10', COMPRESSION => 'NONE', TTL => '2147483647', 
BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}}
2009-09-03 09:04:07,724 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: 
locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 2000
org.apache.hadoop.hbase.client.NoServerForRegionException: No server address 
listed in .META. for region TestTable,0255035045,1251968451499
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:660)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:586)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:555)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:401)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:1036)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1122)
        at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:586)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation$Test.testTakedown(PerformanceEvaluation.java:390)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:413)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:639)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:194)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
2009-09-03 09:04:07,725 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Removed 
.META.,,1 for tableName=.META. from cache because of 
TestTable,0256167115,99999999999999
2009-09-03 09:04:07,728 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cached location 
address: XX.XX.44.140:60020, regioninfo: REGION => {NAME => '.META.,,1', 
STARTKEY => '', ENDKEY => '', ENCODED => 1028785192, TABLE => {{NAME => 
'.META.', IS_META => 'true', MEMSTORE_FLUSHSIZE => '16384', FAMILIES => [{NAME 
=> 'historian', VERSIONS => '2147483647', COMPRESSION => 'NONE', TTL => 
'604800', BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, 
{NAME => 'info', VERSIONS => '10', COMPRESSION => 'NONE', TTL => '2147483647', 
BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}}
2009-09-03 09:04:09,994 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: 
locateRegionInMeta attempt 2 of 10 failed; retrying after sleep of 2000
org.apache.hadoop.hbase.client.NoServerForRegionException: No server address 
listed in .META. for region TestTable,0255035045,1251968451499
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:660)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:586)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:555)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:401)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:1036)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1122)
        at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:586)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation$Test.testTakedown(PerformanceEvaluation.java:390)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:413)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:639)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:194)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
2009-09-03 09:04:09,995 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Removed 
.META.,,1 for tableName=.META. from cache because of 
TestTable,0256167115,99999999999999
2009-09-03 09:04:09,997 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cached location 
address: XX.XX.44.140:60020, regioninfo: REGION => {NAME => '.META.,,1', 
STARTKEY => '', ENDKEY => '', ENCODED => 1028785192, TABLE => {{NAME => 
'.META.', IS_META => 'true', MEMSTORE_FLUSHSIZE => '16384', FAMILIES => [{NAME 
=> 'historian', VERSIONS => '2147483647', COMPRESSION => 'NONE', TTL => 
'604800', BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, 
{NAME => 'info', VERSIONS => '10', COMPRESSION => 'NONE', TTL => '2147483647', 
BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}}
2009-09-03 09:04:12,050 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cached location 
address: XX.XX.44.142:60020, regioninfo: REGION => {NAME => 
'TestTable,0255035045,1251968451499', STARTKEY => '0255035045', ENDKEY => 
'0257612642', ENCODED => 2088805037, TABLE => {{NAME => 'TestTable', FAMILIES 
=> [{NAME => 'info', VERSIONS => '3', COMPRESSION => 'NONE', TTL => 
'2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 
'true'}]}}
2009-09-03 09:04:12,816 WARN org.apache.hadoop.mapred.TaskTracker: Error 
running child
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact 
region server Some server, retryOnlyOne=true, index=0, islastrow=false, 
tries=9, numtries=10, i=0, listsize=8643, location=address: XX.XX.44.142:60020, 
regioninfo: REGION => {NAME => 'TestTable,0254895141,1251965526935', STARTKEY 
=> '0254895141', ENDKEY => '0257612642', ENCODED => 1579891142, TABLE => {{NAME 
=> 'TestTable', FAMILIES => [{NAME => 'info', VERSIONS => '3', COMPRESSION => 
'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', 
BLOCKCACHE => 'true'}]}}, region=TestTable,0254895141,1251965526935 for region 
TestTable,0254895141,1251965526935, row '0256167115', but failed after 10 
attempts.
Exceptions:

        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1099)
        at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:586)
        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:451)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest.testRow(PerformanceEvaluation.java:571)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:406)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:639)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:194)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
2009-09-03 09:04:12,819 INFO org.apache.hadoop.mapred.TaskRunner: Runnning 
cleanup for the task
{code}

At the time, the region in question was being split so I'd expect some timeout 
finding region:

{code}
2009-09-03 08:12:11,569 [IPC Server handler 15 on 60001] INFO 
org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_SPLIT: 
TestTable,0254895141,1251963952531: Daughters; 
TestTable,0254895141,1251965526935, TestTab
le,0257612642,1251965526935 from aa0-000-15.u.powerset.com,60020,1251958693334; 
5 of 7
2009-09-03 08:12:11,653 [IPC Server handler 15 on 60001] INFO 
org.apache.hadoop.hbase.master.RegionManager: Assigning region 
TestTable,0254895141,1251965526935 to 
aa0-000-15.u.powerset.com,60020,1251958693334
2009-09-03 08:12:14,691 [IPC Server handler 12 on 60001] INFO 
org.apache.hadoop.hbase.master.ServerManager: Processing 
MSG_REPORT_PROCESS_OPEN: TestTable,0254895141,1251965526935 from 
aa0-000-15.u.powerset.com,60020,12519586933
34; 2 of 4
2009-09-03 08:12:14,691 [IPC Server handler 12 on 60001] INFO 
org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_OPEN: 
TestTable,0254895141,1251965526935 from 
aa0-000-15.u.powerset.com,60020,1251958693334; 4 of
 4
2009-09-03 08:12:14,697 [HMaster] INFO 
org.apache.hadoop.hbase.master.RegionServerOperation: 
TestTable,0254895141,1251965526935 open on XX.XX.44.142:60020
2009-09-03 08:12:14,698 [HMaster] INFO 
org.apache.hadoop.hbase.master.RegionServerOperation: Updated row 
TestTable,0254895141,1251965526935 in region .META.,,1 with 
startcode=1251958693334, server=XX.XX.44.142:60020
2009-09-03 08:23:03,610 [RegionManager.metaScanner] DEBUG 
org.apache.hadoop.hbase.master.BaseScanner: 
TestTable,0254895141,1251965526935/1579891142 no longer has references to 
TestTable,0254895141,1251963952531
2009-09-03 09:04:01,712 [IPC Server handler 3 on 60001] INFO 
org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_SPLIT: 
TestTable,0254895141,1251965526935: Daughters; 
TestTable,0254895141,1251968451499, TestTabl
e,0255035045,1251968451499 from aa0-000-15.u.powerset.com,60020,1251958693334; 
1 of 3
2009-09-03 09:20:03,376 [RegionManager.metaScanner] DEBUG 
org.apache.hadoop.hbase.master.BaseScanner: 
TestTable,0254895141,1251968451499/906753129 no longer has references to 
TestTable,0254895141,1251965526935
2009-09-03 09:20:03,378 [RegionManager.metaScanner] DEBUG 
org.apache.hadoop.hbase.master.BaseScanner: 
TestTable,0255035045,1251968451499/2088805037 no longer has references to 
TestTable,0254895141,1251965526935
2009-09-03 09:20:03,380 [RegionManager.metaScanner] INFO 
org.apache.hadoop.hbase.master.BaseScanner: Deleting region 
TestTable,0254895141,1251965526935 (encoded=1579891142) because daughter splits 
no longer hold references
{code}

The daughters came on line promptly... 

Why is the client giving up so quickly?

I forgot to check why the other 3 attempts falied on this task.

I can get rows from the daughter regions without issue.

> MR failed "RetriesExhaustedException: Trying to contact region server Some 
> server for region TestTable..."
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1603
>                 URL: https://issues.apache.org/jira/browse/HBASE-1603
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: debugging-v4.patch
>
>
> Here is the master.  Region 
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246462685358 was split at 
> 16:11:42,865.  My MR job failed at 18:12:26,462 with this:
> {code}
> 2009-07-01 18:12:26,462 WARN org.apache.hadoop.mapred.TaskTracker: Error 
> running child
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact 
> region server Some server for region TestTable,�,1246464670313, row '��       
>     ', but failed after 10 attempts.
> Exceptions:
> ...
> {code}
> Why after ten attempts did the client not find the region?
> {code}
> 2009-07-01 16:11:42,865 [IPC Server handler 2 on 60001] INFO 
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: 
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246462685358: Daughters; 
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313, 
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 from 
> aa0-000-15.u.powerset.com,60020,1246461673026; 1 of 3
> 2009-07-01 16:11:42,866 [IPC Server handler 2 on 60001] INFO 
> org.apache.hadoop.hbase.master.RegionManager: Assigning region 
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 to 
> aa0-000-15.u.powerset.com,60020,1246461673026
> 2009-07-01 16:11:42,866 [IPC Server handler 2 on 60001] INFO 
> org.apache.hadoop.hbase.master.RegionManager: Assigning region 
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 to 
> aa0-000-15.u.powerset.com,60020,1246461673026
> 2009-07-01 16:11:45,905 [IPC Server handler 8 on 60001] INFO 
> org.apache.hadoop.hbase.master.ServerManager: Received 
> MSG_REPORT_PROCESS_OPEN: 
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 from 
> aa0-000-15.u.powerset.com,60020,1246461673026; 1 of 3
> 2009-07-01 16:11:45,905 [IPC Server handler 8 on 60001] INFO 
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: 
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 from 
> aa0-000-15.u.powerset.com,60020,1246461673026; 2 of 3
> 2009-07-01 16:11:45,906 [IPC Server handler 8 on 60001] INFO 
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: 
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 from 
> aa0-000-15.u.powerset.com,60020,1246461673026; 3 of 3
> 2009-07-01 16:11:45,906 [HMaster] INFO 
> org.apache.hadoop.hbase.master.RegionServerOperation: 
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 open on 
> 208.76.44.142:60020
> 2009-07-01 16:11:45,906 [HMaster] INFO 
> org.apache.hadoop.hbase.master.RegionServerOperation: updating row 
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 in region 
> .META.,,1 with startcode 1246461673026 and server 208.76.44.142:60020
> 2009-07-01 16:11:45,908 [HMaster] INFO 
> org.apache.hadoop.hbase.master.RegionServerOperation: 
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 open on 
> 208.76.44.142:60020
> 2009-07-01 16:11:45,908 [HMaster] INFO 
> org.apache.hadoop.hbase.master.RegionServerOperation: updating row 
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 in region 
> .META.,,1 with startcode 1246461673026 and server 208.76.44.142:60020
> 2009-07-01 17:46:42,670 [IPC Server handler 0 on 60001] INFO 
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: 
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313: Daughters; 
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246470379467, 
> TestTable,\x00\x00\x08\x04\x05\x07\x02\x05\x04\x08,1246470379467 from 
> aa0-000-15.u.powerset.com,60020,1246461673026; 5 of 7
> {code}
> Here is over on the regionserver:
> {code}
> 2009-07-01 16:11:42,865 [IPC Server handler 2 on 60001] INFO 
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: 
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246462685358: Daughters; 
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313, 
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 from 
> aa0-000-15.u.powerset.com,60020,1246461673026; 1 of 3
> 2009-07-01 16:11:42,866 [IPC Server handler 2 on 60001] INFO 
> org.apache.hadoop.hbase.master.RegionManager: Assigning region 
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 to 
> aa0-000-15.u.powerset.com,60020,1246461673026
> 2009-07-01 16:11:42,866 [IPC Server handler 2 on 60001] INFO 
> org.apache.hadoop.hbase.master.RegionManager: Assigning region 
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 to 
> aa0-000-15.u.powerset.com,60020,1246461673026
> 2009-07-01 16:11:45,905 [IPC Server handler 8 on 60001] INFO 
> org.apache.hadoop.hbase.master.ServerManager: Received 
> MSG_REPORT_PROCESS_OPEN: 
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 from 
> aa0-000-15.u.powerset.com,60020,1246461673026; 1 of 3
> 2009-07-01 16:11:45,905 [IPC Server handler 8 on 60001] INFO 
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: 
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 from 
> aa0-000-15.u.powerset.com,60020,1246461673026; 2 of 3
> 2009-07-01 16:11:45,906 [IPC Server handler 8 on 60001] INFO 
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: 
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 from 
> aa0-000-15.u.powerset.com,60020,1246461673026; 3 of 3
> 2009-07-01 16:11:45,906 [HMaster] INFO 
> org.apache.hadoop.hbase.master.RegionServerOperation: 
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 open on 
> X.X.X.142:60020
> 2009-07-01 16:11:45,906 [HMaster] INFO 
> org.apache.hadoop.hbase.master.RegionServerOperation: updating row 
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 in region 
> .META.,,1 with startcode 1246461673026 and server X.X.X4.142:60020
> 2009-07-01 16:11:45,908 [HMaster] INFO 
> org.apache.hadoop.hbase.master.RegionServerOperation: 
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 open on 
> X.X.X.142:60020
> 2009-07-01 16:11:45,908 [HMaster] INFO 
> org.apache.hadoop.hbase.master.RegionServerOperation: updating row 
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 in region 
> .META.,,1 with startcode 1246461673026 and server X.X.X.142:60020
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to