[
https://issues.apache.org/jira/browse/HBASE-13875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579793#comment-14579793
]
Ted Yu commented on HBASE-13875:
--------------------------------
Integrated to branch-1.1
bq. -1 checkstyle. The applied patch generated 3801 checkstyle errors (more
than the master's current 3800 errors).
Addressed through addendum.
> Clock skew between master and region server may render restored region
> without server address
> ---------------------------------------------------------------------------------------------
>
> Key: HBASE-13875
> URL: https://issues.apache.org/jira/browse/HBASE-13875
> Project: HBase
> Issue Type: Bug
> Reporter: Ted Yu
> Assignee: Ted Yu
> Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1
>
> Attachments: 13875-0.98.txt, 13875-branch-1.0.txt,
> 13875-branch-1.txt, 13875-master.txt
>
>
> We observed the following issue in cluster testing on a restored table
> (table_gwbh9rxyz3).
> {code}
> 2015-06-08
> 14:29:47,313|beaver.component.hbase|INFO|6196|140144585275136|MainThread|
> 'get 'table_gwbh9rxyz3','row1', {COLUMN => 'family1'}'
> ...
> 2015-06-08
> 14:31:38,203|beaver.machine|INFO|6196|140144585275136|MainThread|ERROR: No
> server address listed in hbase:meta for region
> table_gwbh9rxyz3,,1433773371699.
> 48652273628a291653d8c43aaa02179a. containing row row1
> {code}
> Here was related log snippet from master - part for
> RestoreSnapshotHandler#handleTableOperation():
> {code}
> 2015-06-08 14:28:41,968 DEBUG
> [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0]
> snapshot.RestoreSnapshotHelper: starting restore
> 2015-06-08 14:28:41,969 DEBUG
> [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0]
> snapshot.RestoreSnapshotHelper: get table regions:
> hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3
> 2015-06-08 14:28:41,984 DEBUG
> [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0]
> snapshot.RestoreSnapshotHelper: found 1 regions for table=table_gwbh9rxyz3
> 2015-06-08 14:28:41,984 INFO
> [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0]
> snapshot.RestoreSnapshotHelper: region to restore:
> 48652273628a291653d8c43aaa02179a
> 2015-06-08 14:28:42,001 DEBUG [RestoreSnapshot-pool584-t1]
> backup.HFileArchiver: Finished archiving from class
> org.apache.hadoop.hbase.backup.HFileArchiver$FileablePath,
> file:hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66,
> to
> hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/archive/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66
> 2015-06-08 14:28:42,002 INFO
> [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor:
> Deleted []
> 2015-06-08 14:28:42,002 INFO
> [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor:
> Added 0
> 2015-06-08 14:28:42,014 INFO
> [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor:
> Deleted [{ENCODED => 48652273628a291653d8c43aaa02179a, NAME =>
> 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY
> => '', ENDKEY => ''}]
> 2015-06-08 14:28:42,022 DEBUG
> [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager:
> Verify snapshot=table_gwbh9rxyz3-ru-20150608
> against=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3
> 2015-06-08 14:28:42,022 DEBUG
> [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager:
> Sentinel is not yet finished with restoring snapshot={
> ss=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3 type=FLUSH }
> 2015-06-08 14:28:42,038 INFO
> [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor:
> Added 2
> 2015-06-08 14:28:42,038 INFO
> [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor:
> Overwritten [{ENCODED => 48652273628a291653d8c43aaa02179a, NAME =>
> 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY
> => '', ENDKEY => ''}]
> {code}
> Here was log snippet from region server - corresponding to table being
> enabled after snapshot restore:
> {code}
> 2015-06-08 14:28:41,914 DEBUG [RS_OPEN_REGION-ip-172-31-46-239:51852-2]
> zookeeper.ZKAssign: regionserver:51852-0x24dd2833c34000b,
> quorum=ip-172-31-46-239.ec2.internal:2181,ip-172-31-46-241.ec2.internal:2181,ip-172-31-46-242.ec2.internal:2181,
> baseZNode=/services/slider/users/hbase/hbasesliderapp Attempting to
> retransition opening state of node 48652273628a291653d8c43aaa02179a
> 2015-06-08 14:28:41,916 INFO
> [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a]
> regionserver.HRegionServer: Post open deploy tasks for
> table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.
> 2015-06-08 14:28:41,920 INFO
> [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a]
> hbase.MetaTableAccessor: Updated row
> table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a. with
> server=ip-172-31-46-239.ec2.internal,51852,1433758173941
> 2015-06-08 14:28:41,920 DEBUG
> [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a]
> regionserver.HRegionServer: Finished post open deploy task for
> table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a
> {code}
> What happened was that due to clock skew, server location
> (ip-172-31-46-239.ec2.internal) for the region was eclipsed by the delete
> marker put in by MetaTableAccessor#overwriteRegions()
> Thanks for [~devaraj] for pair debugging.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)