[ 
https://issues.apache.org/jira/browse/HBASE-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549622#comment-14549622
 ] 

Enis Soztutar edited comment on HBASE-13709 at 5/19/15 1:32 AM:
----------------------------------------------------------------

Here are the details from a test failure: 
We create a table with 8 replicas, and end of test, we verify that all the 
replicas are assigned. For that we do a meta scan, which does not see replica 
id 5 (notice that the server value is null): 
{code}
2015-05-15 
09:03:06,982|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server, timestamp=1431680376535, 
value=ip-172-31-41-12.ec2.internal:16020
2015-05-15 
09:03:06,984|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server_0001, timestamp=1431680377792, 
value=ip-172-31-41-10.ec2.internal:16020
2015-05-15 
09:03:06,985|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server_0002, timestamp=1431680376517, 
value=ip-172-31-41-12.ec2.internal:16020
2015-05-15 
09:03:06,987|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server_0003, timestamp=1431680377792, 
value=ip-172-31-41-10.ec2.internal:16020
2015-05-15 
09:03:06,989|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server_0004, timestamp=1431680377259, 
value=ip-172-31-41-6.ec2.internal:16020
2015-05-15 
09:03:06,990|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server_0005, timestamp=1431680375996, value=
2015-05-15 
09:03:06,992|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server_0006, timestamp=1431680376986, 
value=ip-172-31-41-11.ec2.internal:16020
2015-05-15 
09:03:06,994|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server_0007, timestamp=1431680377359, 
value=ip-172-31-41-7.ec2.internal:16020
{code}

Looking at the master and region server logs, it seems that the region replica 
with name {{tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c}} is 
indeed assigned, and it actually did write to meta: 
{code}
2015-05-15 08:59:35,473 INFO  [PriorityRpcServer.handler=10,queue=0,port=16020] 
regionserver.RSRpcServices: Open 
tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c.
2015-05-15 08:59:35,494 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] 
hfile.CacheConfig: blockCache=LruBlockCache{blockCount=2, currentSize=459272, 
freeSize=420648856, maxSize=421108128, heapSi
2015-05-15 08:59:35,495 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] 
compactions.CompactionConfiguration: size [134217728, 9223372036854775807); 
files [3, 10); ratio 1.200000; off-peak ratio 5
2015-05-15 08:59:35,499 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] 
hfile.CacheConfig: blockCache=LruBlockCache{blockCount=2, currentSize=459272, 
freeSize=420648856, maxSize=421108128, heapSi
2015-05-15 08:59:35,500 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] 
compactions.CompactionConfiguration: size [134217728, 9223372036854775807); 
files [3, 10); ratio 1.200000; off-peak ratio 5
2015-05-15 08:59:35,504 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] 
hfile.CacheConfig: blockCache=LruBlockCache{blockCount=2, currentSize=459272, 
freeSize=420648856, maxSize=421108128, heapSi
2015-05-15 08:59:35,504 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] 
compactions.CompactionConfiguration: size [134217728, 9223372036854775807); 
files [3, 10); ratio 1.200000; off-peak ratio 5
2015-05-15 08:59:35,507 INFO  [RS_OPEN_REGION-ip-172-31-41-8:16020-2] 
regionserver.HRegion: Onlined 68af88e9274633ba67b926b142b7ff4c; next 
sequenceid=2
2015-05-15 08:59:35,513 INFO  
[PostOpenDeployTasks:68af88e9274633ba67b926b142b7ff4c] 
regionserver.HRegionServer: Post open deploy tasks for 
tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c.
2015-05-15 08:59:35,522 INFO  
[PostOpenDeployTasks:68af88e9274633ba67b926b142b7ff4c] hbase.MetaTableAccessor: 
Updated row tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c. with 
server=ip-17
{code}

As you can see from the meta scan results, the cell: 
{code}
column=info:server_0005, timestamp=1431680375996, value=
{code}
has timestamp of 1431680375996, which is {{Fri, 15 May 2015 08:59:35 GMT}} and 
the ms is 996, however the meta update from the RS uses it's local timestamp 
because of HBASE-11536. As you can see from the log:
{code}
2015-05-15 08:59:35,522 INFO  
[PostOpenDeployTasks:68af88e9274633ba67b926b142b7ff4c] hbase.MetaTableAccessor: 
Updated row tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c. with 
server=ip-17
{code}
has a timestamp of {{2015-05-15 08:59:35,522}} which is less than the master's 
TS when it created the meta row with empty location. 

For region replicas, master writes null as server columns to let the meta 
clients know how many region replicas there are at the time of table creation. 
So, the later update from RS for opening the region replica maybe eclipsed by 
this previous update if the RS time clock is behind the master. 



was (Author: enis):
Here are the details from a test failure: 
We create a table with 8 replicas, and end of test, we verify that all the 
replicas are assigned. For that we do a meta scan, which does not see replica 
id 5 (notice that the server value is null): 
{code}
2015-05-15 
09:03:06,982|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server, timestamp=1431680376535, 
value=ip-172-31-41-12.ec2.internal:16020
2015-05-15 
09:03:06,984|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server_0001, timestamp=1431680377792, 
value=ip-172-31-41-10.ec2.internal:16020
2015-05-15 
09:03:06,985|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server_0002, timestamp=1431680376517, 
value=ip-172-31-41-12.ec2.internal:16020
2015-05-15 
09:03:06,987|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server_0003, timestamp=1431680377792, 
value=ip-172-31-41-10.ec2.internal:16020
2015-05-15 
09:03:06,989|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server_0004, timestamp=1431680377259, 
value=ip-172-31-41-6.ec2.internal:16020
2015-05-15 
09:03:06,990|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server_0005, timestamp=1431680375996, value=
2015-05-15 
09:03:06,992|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server_0006, timestamp=1431680376986, 
value=ip-172-31-41-11.ec2.internal:16020
2015-05-15 
09:03:06,994|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
 column=info:server_0007, timestamp=1431680377359, 
value=ip-172-31-41-7.ec2.internal:16020
{code}

Looking at the master and region server logs, it seems that the region replica 
with name {{tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c}} is 
indeed assigned, and it actually did write to meta: 
{code}
2015-05-15 08:59:35,473 INFO  [PriorityRpcServer.handler=10,queue=0,port=16020] 
regionserver.RSRpcServices: Open 
tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c.
2015-05-15 08:59:35,494 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] 
hfile.CacheConfig: blockCache=LruBlockCache{blockCount=2, currentSize=459272, 
freeSize=420648856, maxSize=421108128, heapSi
2015-05-15 08:59:35,495 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] 
compactions.CompactionConfiguration: size [134217728, 9223372036854775807); 
files [3, 10); ratio 1.200000; off-peak ratio 5
2015-05-15 08:59:35,499 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] 
hfile.CacheConfig: blockCache=LruBlockCache{blockCount=2, currentSize=459272, 
freeSize=420648856, maxSize=421108128, heapSi
2015-05-15 08:59:35,500 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] 
compactions.CompactionConfiguration: size [134217728, 9223372036854775807); 
files [3, 10); ratio 1.200000; off-peak ratio 5
2015-05-15 08:59:35,504 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] 
hfile.CacheConfig: blockCache=LruBlockCache{blockCount=2, currentSize=459272, 
freeSize=420648856, maxSize=421108128, heapSi
2015-05-15 08:59:35,504 INFO  [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1] 
compactions.CompactionConfiguration: size [134217728, 9223372036854775807); 
files [3, 10); ratio 1.200000; off-peak ratio 5
2015-05-15 08:59:35,507 INFO  [RS_OPEN_REGION-ip-172-31-41-8:16020-2] 
regionserver.HRegion: Onlined 68af88e9274633ba67b926b142b7ff4c; next 
sequenceid=2
2015-05-15 08:59:35,513 INFO  
[PostOpenDeployTasks:68af88e9274633ba67b926b142b7ff4c] 
regionserver.HRegionServer: Post open deploy tasks for 
tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c.
2015-05-15 08:59:35,522 INFO  
[PostOpenDeployTasks:68af88e9274633ba67b926b142b7ff4c] hbase.MetaTableAccessor: 
Updated row tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c. with 
server=ip-17
{code}

As you can see from the meta scan results, the cell: 
{code}
column=info:server_0005, timestamp=1431680375996, value=
{code}
has timestamp of 1431680375996, which is {{Fri, 15 May 2015 08:59:35 GMT}} and 
the ms is 996, however the meta update from the RS uses it's local timestamp 
because of HBASE-11536. As you can see from the log:
{code}
2015-05-15 08:59:35,522 INFO  
[PostOpenDeployTasks:68af88e9274633ba67b926b142b7ff4c] hbase.MetaTableAccessor: 
Updated row tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c. with 
server=ip-17
{code}
has a timestamp of {{2015-05-15 08:59:35,522}} which is less than the master's 
TS when it created the meta row with empty location. 


> Updates to meta table server columns may be eclipsed
> ----------------------------------------------------
>
>                 Key: HBASE-13709
>                 URL: https://issues.apache.org/jira/browse/HBASE-13709
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.1
>
>
> HBASE-11536 fixes a case where on a very rare occasion, the meta updates may 
> be processed out of order. The fix is to use the RS's timestamp for the 
> server column in meta update, but that actually opens up a vulnerability for 
> clock skew (see the discussions in the jira). 
> For the region replicas case, we can reproduce a problem where the server 
> name field is eclipsed by the masters earlier update because the RS is 
> lagging behind. However, this is not specific to replicas, but occurs more 
> frequently with it. 
> One option that was discussed was to send the master's ts with open region 
> RPC and use it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to