[ 
https://issues.apache.org/jira/browse/HBASE-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549622#comment-14549622
 ] 

Enis Soztutar commented on HBASE-13709:
---------------------------------------

Here are the details from a test failure: 
We create a table with 8 replicas, and end of test, we verify that all the 
replicas are assigned. For that we do a meta scan, which does not see replica 
id 5 (notice that the server value is null): 
{code}
2015-05-15 
09:03:06,982|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server, timestamp=1431680376535, 
value=ip-172-31-41-12.ec2.internal:16020
2015-05-15 
09:03:06,984|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server_0001, timestamp=1431680377792, 
value=ip-172-31-41-10.ec2.internal:16020
2015-05-15 
09:03:06,985|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server_0002, timestamp=1431680376517, 
value=ip-172-31-41-12.ec2.internal:16020
2015-05-15 
09:03:06,987|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server_0003, timestamp=1431680377792, 
value=ip-172-31-41-10.ec2.internal:16020
2015-05-15 
09:03:06,989|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server_0004, timestamp=1431680377259, 
value=ip-172-31-41-6.ec2.internal:16020
2015-05-15 
09:03:06,990|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server_0005, timestamp=1431680375996, value=
2015-05-15 
09:03:06,992|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server_0006, timestamp=1431680376986, 
value=ip-172-31-41-11.ec2.internal:16020
2015-05-15 
09:03:06,994|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server_0007, timestamp=1431680377359, 
value=ip-172-31-41-7.ec2.internal:16020
{code}

Looking at the master and region server logs, it seems that the region replica 
with name {{tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c}} is 
indeed assigned, and it actually did write to meta: 
{code}
2015-05-15 08:59:35,473 INFO  [PriorityRpcServer.handler=10,queue=0,port=16020] 
regionserver.RSRpcServices: Open 
tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c.
2015-05-15 08:59:35,494 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] 
hfile.CacheConfig: blockCache=LruBlockCache{blockCount=2, currentSize=459272, 
freeSize=420648856, maxSize=421108128, heapSi
2015-05-15 08:59:35,495 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] 
compactions.CompactionConfiguration: size [134217728, 9223372036854775807); 
files [3, 10); ratio 1.200000; off-peak ratio 5
2015-05-15 08:59:35,499 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] 
hfile.CacheConfig: blockCache=LruBlockCache{blockCount=2, currentSize=459272, 
freeSize=420648856, maxSize=421108128, heapSi
2015-05-15 08:59:35,500 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] 
compactions.CompactionConfiguration: size [134217728, 9223372036854775807); 
files [3, 10); ratio 1.200000; off-peak ratio 5
2015-05-15 08:59:35,504 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] 
hfile.CacheConfig: blockCache=LruBlockCache{blockCount=2, currentSize=459272, 
freeSize=420648856, maxSize=421108128, heapSi
2015-05-15 08:59:35,504 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] 
compactions.CompactionConfiguration: size [134217728, 9223372036854775807); 
files [3, 10); ratio 1.200000; off-peak ratio 5
2015-05-15 08:59:35,507 INFO  [RS_OPEN_REGION-ip-172-31-41-8:16020-2] 
regionserver.HRegion: Onlined 68af88e9274633ba67b926b142b7ff4c; next 
sequenceid=2
2015-05-15 08:59:35,513 INFO  
[PostOpenDeployTasks:68af88e9274633ba67b926b142b7ff4c] 
regionserver.HRegionServer: Post open deploy tasks for 
tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c.
2015-05-15 08:59:35,522 INFO  
[PostOpenDeployTasks:68af88e9274633ba67b926b142b7ff4c] hbase.MetaTableAccessor: 
Updated row tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c. with 
server=ip-17
{code}

As you can see from the meta scan results, the cell: 
{code}
column=info:server_0005, timestamp=1431680375996, value=
{code}
has timestamp of 1431680375996, which is {{Fri, 15 May 2015 08:59:35 GMT}} and 
the ms is 996, however the meta update from the RS uses it's local timestamp 
because of HBASE-11536. As you can see from the log:
{code}
2015-05-15 08:59:35,522 INFO  
[PostOpenDeployTasks:68af88e9274633ba67b926b142b7ff4c] hbase.MetaTableAccessor: 
Updated row tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c. with 
server=ip-17
{code}
has a timestamp of {{2015-05-15 08:59:35,522}} which is less than the master's 
TS when it created the meta row with empty location. 


> Updates to meta table server columns may be eclipsed
> ----------------------------------------------------
>
>                 Key: HBASE-13709
>                 URL: https://issues.apache.org/jira/browse/HBASE-13709
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.1
>
>
> HBASE-11536 fixes a case where on a very rare occasion, the meta updates may 
> be processed out of order. The fix is to use the RS's timestamp for the 
> server column in meta update, but that actually opens up a vulnerability for 
> clock skew (see the discussions in the jira). 
> For the region replicas case, we can reproduce a problem where the server 
> name field is eclipsed by the masters earlier update because the RS is 
> lagging behind. However, this is not specific to replicas, but occurs more 
> frequently with it. 
> One option that was discussed was to send the master's ts with open region 
> RPC and use it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to