[
https://issues.apache.org/jira/browse/HBASE-10701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated HBASE-10701:
----------------------------------
Attachment: hbase-10701_v2.patch
Attaching a secondary patch, which fixes three interrelated issues.
Fortunately, with this patch, the test HBASE-10572 is able to run on an 8 node
cluster for 100min with CM.
The changes include:
# Individual RPC's for replicas can receive exceptions (RegionMovedException,
etc) and also connection exceptions. Now the cache invalidation is done so that
only the cache entry for the replica location will be cleared instead of the
whole cached meta row.
# When a server is killed, it's locations are removed from the cache. But
after some time, only the primary region info will be left in the cache, and
unless we go and look at the meta again, we won't know about the region
replicas. So no secondary RPC's will be done unless the primary RPC timesout. I
fixed it so that individual locations in RegionLocations are not set to null,
instead individual HRL.serverName's are set to null. This enables the RPC layer
to know about the replicas, but the locations might still be null which will
trigger a meta lookup. There are still some failures in the AP code path that I
am investigating.
# RpcRetryingCallerWithReadReplicas used to schedule the RPC's to primary and
secondaries, and wait for the first result regardless of whether it is an
exception or success. In case of a close connection, one of the RPC's will
immediately return with an DoNotRetryEx, and will fail the whole get()
operation, although we should be able to read from the other replicas perfectly
fine. I changed the code path so that it waits for the first successful
operation, a cancellation or interrupt, or for all operations to fail with
DoNotRetryEx or RetriesExhaustedEx.
[~nkeywal] could you please take a close look?
> Cache invalidation improvements from client side
> ------------------------------------------------
>
> Key: HBASE-10701
> URL: https://issues.apache.org/jira/browse/HBASE-10701
> Project: HBase
> Issue Type: Sub-task
> Reporter: Enis Soztutar
> Assignee: Enis Soztutar
> Fix For: hbase-10070
>
> Attachments: hbase-10701_v1.patch, hbase-10701_v2.patch
>
>
> Running the integration test in HBASE-10572, and HBASE-10355, it seems that
> we need some changes for cache invalidation of meta entries from the client
> side in backup RPCs.
> Mainly the RPC's made for replicas should not invalidate the cache for all
> the replicas (for example on RegionMovedException, connection error etc).
--
This message was sent by Atlassian JIRA
(v6.2#6252)