[ 
https://issues.apache.org/jira/browse/HBASE-29502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Connell updated HBASE-29502:
------------------------------------
    Description: 
When region replicas are enabled in "asynchronous WAL replication" mode, each 
RegionServer uses a {{RegionReplicaReplicationEndpoint}} object to tail its own 
WAL. Each mutation in its WAL may be related to a region which has its primary 
replica on this RegionServer, and has one or more secondary replicas on other 
servers. So, for each mutation in the WAL, {{RegionReplicaReplicationEndpoint}} 
decides whether any other servers are hosting replicas of the relevant region, 
and if so, sends an RPC to those servers containing the mutations they should 
apply to their memstores.

When region replicas are enabled, a {{RegionReplicaReplicationEndpoint}} 
instance is created, with its own {{ConnectionImplementation}} and therefore 
its own {{MetaCache}}. This {{RegionReplicaReplicationEndpoint}} immediately 
starts attempting to send mutations to secondary replica regions, even though 
they will not be open for a few more seconds or minutes. In this moment, the 
{{MetaCache}} gets populated with entries that say that most regions are hosted 
on only one server. These cached lookups remain in use indefinitely, even 
though they are incorrect for most of their lifetime. Without knowing where the 
secondary replica regions are hosted, or if they exist at all, the 
{{RegionReplicaReplicationEndpoint}} cannot forward mutations to them. This 
leads to the secondary replica regions' memstores not getting updates, so their 
data is even more stale than it should be. Users would get unnecessarily 
incorrect results.

{{RegionReplicaReplicationEndpoint}} actually contains cache-busting logic 
seemingly designed to fix this exact problem:
{code:java}
// Replicas can take a while to come online. The cache may have only the 
primary. If we
// keep going to the cache, we will not learn of the replicas and their 
locations after
// they come online.
if (useCache && locations.size() == 1 && TableName.isMetaTableName(tableName)) {
  if (tableDescriptors.get(tableName).getRegionReplication() > 1) {
    // Make an obnoxious log here. See how bad this issue is. Add a timer if 
happening
    // too much.
    LOG.info("Skipping location cache; only one location found for {}", 
tableName);
    useCache = false;
    continue;
  }
}
{code}

However, because of the {{TableName.isMetaTableName(tableName)}} clause, the 
cache-busting only takes effect if the mutation being forwarded belongs to the 
meta table. I don't know why that restriction would make sense.

In this ticket I plan to just remove the "is meta table" clause to fix this bug.

  was:
When region replicas are enabled in "asynchronous WAL replication" mode, each 
RegionServer uses a {{RegionReplicaReplicationEndpoint}} object to tail its own 
WAL. Each mutation in its WAL may be related to a region which has its primary 
replica on this RegionServer, and has one or more secondary replicas on other 
servers. So, for each mutation in the WAL, {{RegionReplicaReplicationEndpoint}} 
decides whether any other servers are hosting replicas of the relevant region, 
and if so, sends an RPC to those servers containing the mutations they should 
apply to their memstores.

When region replicas are enabled, a {{RegionReplicaReplicationEndpoint}} 
instance is created, with its own {{ConnectionImplementation}} and therefore 
its own {{MetaCache}}. This {{RegionReplicaReplicationEndpoint}} immediately 
starts attempting to send mutations to secondary replica regions, even though 
they will not be open for a few more seconds or minutes. In this moment, the 
{{MetaCache}} gets populated with entries that say that most regions are hosted 
on only one server. These cached lookups remain in use indefinitely, even 
though they are incorrect for most of their lifetime. Without knowing where the 
secondary replica regions are hosted, or if they exist at all, the 
{{RegionReplicaReplicationEndpoint}} cannot forward mutations to them.

{{RegionReplicaReplicationEndpoint}} actually contains cache-busting logic 
seemingly designed to fix this exact problem:
{code:java}
// Replicas can take a while to come online. The cache may have only the 
primary. If we
// keep going to the cache, we will not learn of the replicas and their 
locations after
// they come online.
if (useCache && locations.size() == 1 && TableName.isMetaTableName(tableName)) {
  if (tableDescriptors.get(tableName).getRegionReplication() > 1) {
    // Make an obnoxious log here. See how bad this issue is. Add a timer if 
happening
    // too much.
    LOG.info("Skipping location cache; only one location found for {}", 
tableName);
    useCache = false;
    continue;
  }
}
{code}

However, because of the {{TableName.isMetaTableName(tableName)}} clause, the 
cache-busting only takes effect if the mutation being forwarded belongs to the 
meta table. I don't know why that restriction would make sense.

In this ticket I plan to just remove the "is meta table" clause to fix this bug.


> RegionReplicaReplicationEndpoint fails to forward mutations when meta cache 
> does not contain secondary replica locations
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-29502
>                 URL: https://issues.apache.org/jira/browse/HBASE-29502
>             Project: HBase
>          Issue Type: Bug
>          Components: read replicas
>            Reporter: Charles Connell
>            Assignee: Charles Connell
>            Priority: Major
>
> When region replicas are enabled in "asynchronous WAL replication" mode, each 
> RegionServer uses a {{RegionReplicaReplicationEndpoint}} object to tail its 
> own WAL. Each mutation in its WAL may be related to a region which has its 
> primary replica on this RegionServer, and has one or more secondary replicas 
> on other servers. So, for each mutation in the WAL, 
> {{RegionReplicaReplicationEndpoint}} decides whether any other servers are 
> hosting replicas of the relevant region, and if so, sends an RPC to those 
> servers containing the mutations they should apply to their memstores.
> When region replicas are enabled, a {{RegionReplicaReplicationEndpoint}} 
> instance is created, with its own {{ConnectionImplementation}} and therefore 
> its own {{MetaCache}}. This {{RegionReplicaReplicationEndpoint}} immediately 
> starts attempting to send mutations to secondary replica regions, even though 
> they will not be open for a few more seconds or minutes. In this moment, the 
> {{MetaCache}} gets populated with entries that say that most regions are 
> hosted on only one server. These cached lookups remain in use indefinitely, 
> even though they are incorrect for most of their lifetime. Without knowing 
> where the secondary replica regions are hosted, or if they exist at all, the 
> {{RegionReplicaReplicationEndpoint}} cannot forward mutations to them. This 
> leads to the secondary replica regions' memstores not getting updates, so 
> their data is even more stale than it should be. Users would get 
> unnecessarily incorrect results.
> {{RegionReplicaReplicationEndpoint}} actually contains cache-busting logic 
> seemingly designed to fix this exact problem:
> {code:java}
> // Replicas can take a while to come online. The cache may have only the 
> primary. If we
> // keep going to the cache, we will not learn of the replicas and their 
> locations after
> // they come online.
> if (useCache && locations.size() == 1 && 
> TableName.isMetaTableName(tableName)) {
>   if (tableDescriptors.get(tableName).getRegionReplication() > 1) {
>     // Make an obnoxious log here. See how bad this issue is. Add a timer if 
> happening
>     // too much.
>     LOG.info("Skipping location cache; only one location found for {}", 
> tableName);
>     useCache = false;
>     continue;
>   }
> }
> {code}
> However, because of the {{TableName.isMetaTableName(tableName)}} clause, the 
> cache-busting only takes effect if the mutation being forwarded belongs to 
> the meta table. I don't know why that restriction would make sense.
> In this ticket I plan to just remove the "is meta table" clause to fix this 
> bug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to