[
https://issues.apache.org/jira/browse/HBASE-10701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated HBASE-10701:
----------------------------------
Attachment: hbase-10701_v3.patch
Thanks Nicolas for the careful review.
I've changed the patch so that I dropped the approach or using HRL's with null
ServerNames. Instead we still set the HRL item as null inside RegionLocations.
RegionLocations now, can contain null elements at the tail of the array as
well. This enables the cache to know about how many replicas there are, but the
locations might still be unknown.
I've been testing this with
{code}
hbase
org.apache.hadoop.hbase.test.IntegrationTestTimeBoundedRequestsWithRegionReplicas
-Dhbase.IntegrationTestTimeBoundedRequestsWithRegionReplicas.runtime=600000
-DIntegrationTestTimeBoundedRequestsWithRegionReplicas.num_write_threads=30
-DIntegrationTestTimeBoundedRequestsWithRegionReplicas.region_replication=3
-DIntegrationTestTimeBoundedRequestsWithRegionReplicas.num_read_threads=30
-Dhbase.ipc.client.allowsInterrupt=true
{code}
it seems the issues are fixed. However, I notice that the test most of the time
dies with OOM, cannot create native thread, because the number of threads grow
unbounded (north of 4K).
Tried setting -Dhbase.hconnection.threads.max=512 with no results so far.
One other issue (probably related) was that the RPC's would not start for a
long time and timeout the gets (10-20 secs) because the thread pool executor
does not schedule the tasks in the CompletionService from
RpcRetryingCallerWithReadReplicas. Do you have any opinion around this? Should
we create a secondary pool for the backup requests? If we address the thread
growing problem, probably this will be fixed as well.
The v3 patch also addresses your comments, except for the DoNotRetryEx. We'll
have to get this running consistently before addressing that I think.
> Cache invalidation improvements from client side
> ------------------------------------------------
>
> Key: HBASE-10701
> URL: https://issues.apache.org/jira/browse/HBASE-10701
> Project: HBase
> Issue Type: Sub-task
> Reporter: Enis Soztutar
> Assignee: Enis Soztutar
> Fix For: hbase-10070
>
> Attachments: hbase-10701_v1.patch, hbase-10701_v2.patch,
> hbase-10701_v3.patch
>
>
> Running the integration test in HBASE-10572, and HBASE-10355, it seems that
> we need some changes for cache invalidation of meta entries from the client
> side in backup RPCs.
> Mainly the RPC's made for replicas should not invalidate the cache for all
> the replicas (for example on RegionMovedException, connection error etc).
--
This message was sent by Atlassian JIRA
(v6.2#6252)