[ https://issues.apache.org/jira/browse/HBASE-13875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ted Yu updated HBASE-13875: --------------------------- Attachment: 13875-branch-1.txt > Clock skew between master and region server may render restored region > without server address > --------------------------------------------------------------------------------------------- > > Key: HBASE-13875 > URL: https://issues.apache.org/jira/browse/HBASE-13875 > Project: HBase > Issue Type: Bug > Reporter: Ted Yu > Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.1.1 > > Attachments: 13875-branch-1.txt > > > We observed the following issue in cluster testing on a restored table > (table_gwbh9rxyz3). > {code} > 2015-06-08 > 14:29:47,313|beaver.component.hbase|INFO|6196|140144585275136|MainThread| > 'get 'table_gwbh9rxyz3','row1', {COLUMN => 'family1'}' > ... > 2015-06-08 > 14:31:38,203|beaver.machine|INFO|6196|140144585275136|MainThread|ERROR: No > server address listed in hbase:meta for region > table_gwbh9rxyz3,,1433773371699. > 48652273628a291653d8c43aaa02179a. containing row row1 > {code} > Here was related log snippet from master - part for > RestoreSnapshotHandler#handleTableOperation(): > {code} > 2015-06-08 14:28:41,968 DEBUG > [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] > snapshot.RestoreSnapshotHelper: starting restore > 2015-06-08 14:28:41,969 DEBUG > [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] > snapshot.RestoreSnapshotHelper: get table regions: > hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3 > 2015-06-08 14:28:41,984 DEBUG > [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] > snapshot.RestoreSnapshotHelper: found 1 regions for table=table_gwbh9rxyz3 > 2015-06-08 14:28:41,984 INFO > [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] > snapshot.RestoreSnapshotHelper: region to restore: > 48652273628a291653d8c43aaa02179a > 2015-06-08 14:28:42,001 DEBUG [RestoreSnapshot-pool584-t1] > backup.HFileArchiver: Finished archiving from class > org.apache.hadoop.hbase.backup.HFileArchiver$FileablePath, > file:hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66, > to > hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/archive/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66 > 2015-06-08 14:28:42,002 INFO > [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: > Deleted [] > 2015-06-08 14:28:42,002 INFO > [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: > Added 0 > 2015-06-08 14:28:42,014 INFO > [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: > Deleted [{ENCODED => 48652273628a291653d8c43aaa02179a, NAME => > 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY > => '', ENDKEY => ''}] > 2015-06-08 14:28:42,022 DEBUG > [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: > Verify snapshot=table_gwbh9rxyz3-ru-20150608 > against=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3 > 2015-06-08 14:28:42,022 DEBUG > [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: > Sentinel is not yet finished with restoring snapshot={ > ss=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3 type=FLUSH } > 2015-06-08 14:28:42,038 INFO > [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: > Added 2 > 2015-06-08 14:28:42,038 INFO > [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: > Overwritten [{ENCODED => 48652273628a291653d8c43aaa02179a, NAME => > 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY > => '', ENDKEY => ''}] > {code} > Here was log snippet from region server - corresponding to table being > enabled after snapshot restore: > {code} > 2015-06-08 14:28:41,914 DEBUG [RS_OPEN_REGION-ip-172-31-46-239:51852-2] > zookeeper.ZKAssign: regionserver:51852-0x24dd2833c34000b, > quorum=ip-172-31-46-239.ec2.internal:2181,ip-172-31-46-241.ec2.internal:2181,ip-172-31-46-242.ec2.internal:2181, > baseZNode=/services/slider/users/hbase/hbasesliderapp Attempting to > retransition opening state of node 48652273628a291653d8c43aaa02179a > 2015-06-08 14:28:41,916 INFO > [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] > regionserver.HRegionServer: Post open deploy tasks for > table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a. > 2015-06-08 14:28:41,920 INFO > [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] > hbase.MetaTableAccessor: Updated row > table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a. with > server=ip-172-31-46-239.ec2.internal,51852,1433758173941 > 2015-06-08 14:28:41,920 DEBUG > [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] > regionserver.HRegionServer: Finished post open deploy task for > table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a > {code} > What happened was that due to clock skew, server location > (ip-172-31-46-239.ec2.internal) for the region was eclipsed by the delete > marker put in by MetaTableAccessor#overwriteRegions() > Thanks for [~devaraj] for pair debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)