[
https://issues.apache.org/jira/browse/HBASE-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549622#comment-14549622
]
Enis Soztutar commented on HBASE-13709:
---------------------------------------
Here are the details from a test failure:
We create a table with 8 replicas, and end of test, we verify that all the
replicas are assigned. For that we do a meta scan, which does not see replica
id 5 (notice that the server value is null):
{code}
2015-05-15
09:03:06,982|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
column=info:server, timestamp=1431680376535,
value=ip-172-31-41-12.ec2.internal:16020
2015-05-15
09:03:06,984|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
column=info:server_0001, timestamp=1431680377792,
value=ip-172-31-41-10.ec2.internal:16020
2015-05-15
09:03:06,985|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
column=info:server_0002, timestamp=1431680376517,
value=ip-172-31-41-12.ec2.internal:16020
2015-05-15
09:03:06,987|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
column=info:server_0003, timestamp=1431680377792,
value=ip-172-31-41-10.ec2.internal:16020
2015-05-15
09:03:06,989|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
column=info:server_0004, timestamp=1431680377259,
value=ip-172-31-41-6.ec2.internal:16020
2015-05-15
09:03:06,990|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
column=info:server_0005, timestamp=1431680375996, value=
2015-05-15
09:03:06,992|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
column=info:server_0006, timestamp=1431680376986,
value=ip-172-31-41-11.ec2.internal:16020
2015-05-15
09:03:06,994|beaver.machine|INFO|30851|140131703838464|MainThread|tableone,,1431680375568.d3059f4bf8e7a0a07660d7cf01217189.
column=info:server_0007, timestamp=1431680377359,
value=ip-172-31-41-7.ec2.internal:16020
{code}
Looking at the master and region server logs, it seems that the region replica
with name {{tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c}} is
indeed assigned, and it actually did write to meta:
{code}
2015-05-15 08:59:35,473 INFO [PriorityRpcServer.handler=10,queue=0,port=16020]
regionserver.RSRpcServices: Open
tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c.
2015-05-15 08:59:35,494 INFO [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1]
hfile.CacheConfig: blockCache=LruBlockCache{blockCount=2, currentSize=459272,
freeSize=420648856, maxSize=421108128, heapSi
2015-05-15 08:59:35,495 INFO [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1]
compactions.CompactionConfiguration: size [134217728, 9223372036854775807);
files [3, 10); ratio 1.200000; off-peak ratio 5
2015-05-15 08:59:35,499 INFO [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1]
hfile.CacheConfig: blockCache=LruBlockCache{blockCount=2, currentSize=459272,
freeSize=420648856, maxSize=421108128, heapSi
2015-05-15 08:59:35,500 INFO [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1]
compactions.CompactionConfiguration: size [134217728, 9223372036854775807);
files [3, 10); ratio 1.200000; off-peak ratio 5
2015-05-15 08:59:35,504 INFO [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1]
hfile.CacheConfig: blockCache=LruBlockCache{blockCount=2, currentSize=459272,
freeSize=420648856, maxSize=421108128, heapSi
2015-05-15 08:59:35,504 INFO [StoreOpener-68af88e9274633ba67b926b142b7ff4c-1]
compactions.CompactionConfiguration: size [134217728, 9223372036854775807);
files [3, 10); ratio 1.200000; off-peak ratio 5
2015-05-15 08:59:35,507 INFO [RS_OPEN_REGION-ip-172-31-41-8:16020-2]
regionserver.HRegion: Onlined 68af88e9274633ba67b926b142b7ff4c; next
sequenceid=2
2015-05-15 08:59:35,513 INFO
[PostOpenDeployTasks:68af88e9274633ba67b926b142b7ff4c]
regionserver.HRegionServer: Post open deploy tasks for
tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c.
2015-05-15 08:59:35,522 INFO
[PostOpenDeployTasks:68af88e9274633ba67b926b142b7ff4c] hbase.MetaTableAccessor:
Updated row tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c. with
server=ip-17
{code}
As you can see from the meta scan results, the cell:
{code}
column=info:server_0005, timestamp=1431680375996, value=
{code}
has timestamp of 1431680375996, which is {{Fri, 15 May 2015 08:59:35 GMT}} and
the ms is 996, however the meta update from the RS uses it's local timestamp
because of HBASE-11536. As you can see from the log:
{code}
2015-05-15 08:59:35,522 INFO
[PostOpenDeployTasks:68af88e9274633ba67b926b142b7ff4c] hbase.MetaTableAccessor:
Updated row tableone,,1431680375568_0005.68af88e9274633ba67b926b142b7ff4c. with
server=ip-17
{code}
has a timestamp of {{2015-05-15 08:59:35,522}} which is less than the master's
TS when it created the meta row with empty location.
> Updates to meta table server columns may be eclipsed
> ----------------------------------------------------
>
> Key: HBASE-13709
> URL: https://issues.apache.org/jira/browse/HBASE-13709
> Project: HBase
> Issue Type: Bug
> Reporter: Enis Soztutar
> Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.1
>
>
> HBASE-11536 fixes a case where on a very rare occasion, the meta updates may
> be processed out of order. The fix is to use the RS's timestamp for the
> server column in meta update, but that actually opens up a vulnerability for
> clock skew (see the discussions in the jira).
> For the region replicas case, we can reproduce a problem where the server
> name field is eclipsed by the masters earlier update because the RS is
> lagging behind. However, this is not specific to replicas, but occurs more
> frequently with it.
> One option that was discussed was to send the master's ts with open region
> RPC and use it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)