[ https://issues.apache.org/jira/browse/HBASE-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751205#action_12751205 ]
stack commented on HBASE-1603: ------------------------------ Mikhail: I doubt very much its you. Its a pity you don't have a timestamp for when you saw the RetriesExhaustedException issue. It seems like you are having the issue originally described here and another, the missing .META. region. I just saw this issue again myself with RC3. First, the client-side up in the MR task. The task fails because of TestTable,0254895141,1251965526935. It says retries exhausted but it doesn't look like it was trying very had... we timed out too fast it would seem. {code} 2009-09-03 09:04:05,509 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0256167115,99999999999999 2009-09-03 09:04:05,512 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cached location address: XX.XX.44.140:60020, regioninfo: REGION => {NAME => '.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192, TABLE => {{NAME => '.META.', IS_META => 'true', MEMSTORE_FLUSHSIZE => '16384', FAMILIES => [{NAME => 'historian', VERSIONS => '2147483647', COMPRESSION => 'NONE', TTL => '604800', BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, {NAME => 'info', VERSIONS => '10', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}} 2009-09-03 09:04:07,724 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 2000 org.apache.hadoop.hbase.client.NoServerForRegionException: No server address listed in .META. for region TestTable,0255035045,1251968451499 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:660) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:586) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:555) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:401) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:1036) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1122) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:586) at org.apache.hadoop.hbase.PerformanceEvaluation$Test.testTakedown(PerformanceEvaluation.java:390) at org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:413) at org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:639) at org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:194) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) 2009-09-03 09:04:07,725 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0256167115,99999999999999 2009-09-03 09:04:07,728 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cached location address: XX.XX.44.140:60020, regioninfo: REGION => {NAME => '.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192, TABLE => {{NAME => '.META.', IS_META => 'true', MEMSTORE_FLUSHSIZE => '16384', FAMILIES => [{NAME => 'historian', VERSIONS => '2147483647', COMPRESSION => 'NONE', TTL => '604800', BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, {NAME => 'info', VERSIONS => '10', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}} 2009-09-03 09:04:09,994 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: locateRegionInMeta attempt 2 of 10 failed; retrying after sleep of 2000 org.apache.hadoop.hbase.client.NoServerForRegionException: No server address listed in .META. for region TestTable,0255035045,1251968451499 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:660) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:586) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:555) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:401) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:1036) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1122) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:586) at org.apache.hadoop.hbase.PerformanceEvaluation$Test.testTakedown(PerformanceEvaluation.java:390) at org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:413) at org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:639) at org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:194) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) 2009-09-03 09:04:09,995 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Removed .META.,,1 for tableName=.META. from cache because of TestTable,0256167115,99999999999999 2009-09-03 09:04:09,997 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cached location address: XX.XX.44.140:60020, regioninfo: REGION => {NAME => '.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192, TABLE => {{NAME => '.META.', IS_META => 'true', MEMSTORE_FLUSHSIZE => '16384', FAMILIES => [{NAME => 'historian', VERSIONS => '2147483647', COMPRESSION => 'NONE', TTL => '604800', BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, {NAME => 'info', VERSIONS => '10', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}} 2009-09-03 09:04:12,050 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cached location address: XX.XX.44.142:60020, regioninfo: REGION => {NAME => 'TestTable,0255035045,1251968451499', STARTKEY => '0255035045', ENDKEY => '0257612642', ENCODED => 2088805037, TABLE => {{NAME => 'TestTable', FAMILIES => [{NAME => 'info', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}} 2009-09-03 09:04:12,816 WARN org.apache.hadoop.mapred.TaskTracker: Error running child org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server Some server, retryOnlyOne=true, index=0, islastrow=false, tries=9, numtries=10, i=0, listsize=8643, location=address: XX.XX.44.142:60020, regioninfo: REGION => {NAME => 'TestTable,0254895141,1251965526935', STARTKEY => '0254895141', ENDKEY => '0257612642', ENCODED => 1579891142, TABLE => {{NAME => 'TestTable', FAMILIES => [{NAME => 'info', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}, region=TestTable,0254895141,1251965526935 for region TestTable,0254895141,1251965526935, row '0256167115', but failed after 10 attempts. Exceptions: at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1099) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:586) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:451) at org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest.testRow(PerformanceEvaluation.java:571) at org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:406) at org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:639) at org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:194) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) 2009-09-03 09:04:12,819 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task {code} At the time, the region in question was being split so I'd expect some timeout finding region: {code} 2009-09-03 08:12:11,569 [IPC Server handler 15 on 60001] INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_SPLIT: TestTable,0254895141,1251963952531: Daughters; TestTable,0254895141,1251965526935, TestTab le,0257612642,1251965526935 from aa0-000-15.u.powerset.com,60020,1251958693334; 5 of 7 2009-09-03 08:12:11,653 [IPC Server handler 15 on 60001] INFO org.apache.hadoop.hbase.master.RegionManager: Assigning region TestTable,0254895141,1251965526935 to aa0-000-15.u.powerset.com,60020,1251958693334 2009-09-03 08:12:14,691 [IPC Server handler 12 on 60001] INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_PROCESS_OPEN: TestTable,0254895141,1251965526935 from aa0-000-15.u.powerset.com,60020,12519586933 34; 2 of 4 2009-09-03 08:12:14,691 [IPC Server handler 12 on 60001] INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_OPEN: TestTable,0254895141,1251965526935 from aa0-000-15.u.powerset.com,60020,1251958693334; 4 of 4 2009-09-03 08:12:14,697 [HMaster] INFO org.apache.hadoop.hbase.master.RegionServerOperation: TestTable,0254895141,1251965526935 open on XX.XX.44.142:60020 2009-09-03 08:12:14,698 [HMaster] INFO org.apache.hadoop.hbase.master.RegionServerOperation: Updated row TestTable,0254895141,1251965526935 in region .META.,,1 with startcode=1251958693334, server=XX.XX.44.142:60020 2009-09-03 08:23:03,610 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0254895141,1251965526935/1579891142 no longer has references to TestTable,0254895141,1251963952531 2009-09-03 09:04:01,712 [IPC Server handler 3 on 60001] INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_SPLIT: TestTable,0254895141,1251965526935: Daughters; TestTable,0254895141,1251968451499, TestTabl e,0255035045,1251968451499 from aa0-000-15.u.powerset.com,60020,1251958693334; 1 of 3 2009-09-03 09:20:03,376 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0254895141,1251968451499/906753129 no longer has references to TestTable,0254895141,1251965526935 2009-09-03 09:20:03,378 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0255035045,1251968451499/2088805037 no longer has references to TestTable,0254895141,1251965526935 2009-09-03 09:20:03,380 [RegionManager.metaScanner] INFO org.apache.hadoop.hbase.master.BaseScanner: Deleting region TestTable,0254895141,1251965526935 (encoded=1579891142) because daughter splits no longer hold references {code} The daughters came on line promptly... Why is the client giving up so quickly? I forgot to check why the other 3 attempts falied on this task. I can get rows from the daughter regions without issue. > MR failed "RetriesExhaustedException: Trying to contact region server Some > server for region TestTable..." > ---------------------------------------------------------------------------------------------------------- > > Key: HBASE-1603 > URL: https://issues.apache.org/jira/browse/HBASE-1603 > Project: Hadoop HBase > Issue Type: Bug > Reporter: stack > Attachments: debugging-v4.patch > > > Here is the master. Region > TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246462685358 was split at > 16:11:42,865. My MR job failed at 18:12:26,462 with this: > {code} > 2009-07-01 18:12:26,462 WARN org.apache.hadoop.mapred.TaskTracker: Error > running child > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact > region server Some server for region TestTable,�,1246464670313, row '�� > ', but failed after 10 attempts. > Exceptions: > ... > {code} > Why after ten attempts did the client not find the region? > {code} > 2009-07-01 16:11:42,865 [IPC Server handler 2 on 60001] INFO > org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: > TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246462685358: Daughters; > TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313, > TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 from > aa0-000-15.u.powerset.com,60020,1246461673026; 1 of 3 > 2009-07-01 16:11:42,866 [IPC Server handler 2 on 60001] INFO > org.apache.hadoop.hbase.master.RegionManager: Assigning region > TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 to > aa0-000-15.u.powerset.com,60020,1246461673026 > 2009-07-01 16:11:42,866 [IPC Server handler 2 on 60001] INFO > org.apache.hadoop.hbase.master.RegionManager: Assigning region > TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 to > aa0-000-15.u.powerset.com,60020,1246461673026 > 2009-07-01 16:11:45,905 [IPC Server handler 8 on 60001] INFO > org.apache.hadoop.hbase.master.ServerManager: Received > MSG_REPORT_PROCESS_OPEN: > TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 from > aa0-000-15.u.powerset.com,60020,1246461673026; 1 of 3 > 2009-07-01 16:11:45,905 [IPC Server handler 8 on 60001] INFO > org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: > TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 from > aa0-000-15.u.powerset.com,60020,1246461673026; 2 of 3 > 2009-07-01 16:11:45,906 [IPC Server handler 8 on 60001] INFO > org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: > TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 from > aa0-000-15.u.powerset.com,60020,1246461673026; 3 of 3 > 2009-07-01 16:11:45,906 [HMaster] INFO > org.apache.hadoop.hbase.master.RegionServerOperation: > TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 open on > 208.76.44.142:60020 > 2009-07-01 16:11:45,906 [HMaster] INFO > org.apache.hadoop.hbase.master.RegionServerOperation: updating row > TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 in region > .META.,,1 with startcode 1246461673026 and server 208.76.44.142:60020 > 2009-07-01 16:11:45,908 [HMaster] INFO > org.apache.hadoop.hbase.master.RegionServerOperation: > TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 open on > 208.76.44.142:60020 > 2009-07-01 16:11:45,908 [HMaster] INFO > org.apache.hadoop.hbase.master.RegionServerOperation: updating row > TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 in region > .META.,,1 with startcode 1246461673026 and server 208.76.44.142:60020 > 2009-07-01 17:46:42,670 [IPC Server handler 0 on 60001] INFO > org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: > TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313: Daughters; > TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246470379467, > TestTable,\x00\x00\x08\x04\x05\x07\x02\x05\x04\x08,1246470379467 from > aa0-000-15.u.powerset.com,60020,1246461673026; 5 of 7 > {code} > Here is over on the regionserver: > {code} > 2009-07-01 16:11:42,865 [IPC Server handler 2 on 60001] INFO > org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: > TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246462685358: Daughters; > TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313, > TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 from > aa0-000-15.u.powerset.com,60020,1246461673026; 1 of 3 > 2009-07-01 16:11:42,866 [IPC Server handler 2 on 60001] INFO > org.apache.hadoop.hbase.master.RegionManager: Assigning region > TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 to > aa0-000-15.u.powerset.com,60020,1246461673026 > 2009-07-01 16:11:42,866 [IPC Server handler 2 on 60001] INFO > org.apache.hadoop.hbase.master.RegionManager: Assigning region > TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 to > aa0-000-15.u.powerset.com,60020,1246461673026 > 2009-07-01 16:11:45,905 [IPC Server handler 8 on 60001] INFO > org.apache.hadoop.hbase.master.ServerManager: Received > MSG_REPORT_PROCESS_OPEN: > TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 from > aa0-000-15.u.powerset.com,60020,1246461673026; 1 of 3 > 2009-07-01 16:11:45,905 [IPC Server handler 8 on 60001] INFO > org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: > TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 from > aa0-000-15.u.powerset.com,60020,1246461673026; 2 of 3 > 2009-07-01 16:11:45,906 [IPC Server handler 8 on 60001] INFO > org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: > TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 from > aa0-000-15.u.powerset.com,60020,1246461673026; 3 of 3 > 2009-07-01 16:11:45,906 [HMaster] INFO > org.apache.hadoop.hbase.master.RegionServerOperation: > TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 open on > X.X.X.142:60020 > 2009-07-01 16:11:45,906 [HMaster] INFO > org.apache.hadoop.hbase.master.RegionServerOperation: updating row > TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 in region > .META.,,1 with startcode 1246461673026 and server X.X.X4.142:60020 > 2009-07-01 16:11:45,908 [HMaster] INFO > org.apache.hadoop.hbase.master.RegionServerOperation: > TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 open on > X.X.X.142:60020 > 2009-07-01 16:11:45,908 [HMaster] INFO > org.apache.hadoop.hbase.master.RegionServerOperation: updating row > TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 in region > .META.,,1 with startcode 1246461673026 and server X.X.X.142:60020 > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.