[ 
https://issues.apache.org/jira/browse/HBASE-10701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-10701:
----------------------------------

    Attachment: hbase-10701_v3.patch

Thanks Nicolas for the careful review. 

I've changed the patch so that I dropped the approach or using HRL's with null 
ServerNames. Instead we still set the HRL item as null inside RegionLocations. 
RegionLocations now, can contain null elements at the tail of the array as 
well. This enables the cache to know about how many replicas there are, but the 
locations might still be unknown. 

I've been testing this with
{code}
hbase 
org.apache.hadoop.hbase.test.IntegrationTestTimeBoundedRequestsWithRegionReplicas
 -Dhbase.IntegrationTestTimeBoundedRequestsWithRegionReplicas.runtime=600000 
-DIntegrationTestTimeBoundedRequestsWithRegionReplicas.num_write_threads=30 
-DIntegrationTestTimeBoundedRequestsWithRegionReplicas.region_replication=3 
-DIntegrationTestTimeBoundedRequestsWithRegionReplicas.num_read_threads=30 
-Dhbase.ipc.client.allowsInterrupt=true
{code}
it seems the issues are fixed. However, I notice that the test most of the time 
dies with OOM, cannot create native thread, because the number of threads grow 
unbounded (north of 4K). 
Tried setting -Dhbase.hconnection.threads.max=512 with no results so far. 

One other issue (probably related) was that the RPC's would not start for a 
long time and timeout the gets (10-20 secs) because the thread pool executor 
does not schedule the tasks in the CompletionService from 
RpcRetryingCallerWithReadReplicas. Do you have any opinion around this? Should 
we create a secondary pool for the backup requests? If we address the thread 
growing problem, probably this will be fixed as well. 

The v3 patch also addresses your comments, except for the DoNotRetryEx. We'll 
have to get this running consistently before addressing that I think. 

> Cache invalidation improvements from client side
> ------------------------------------------------
>
>                 Key: HBASE-10701
>                 URL: https://issues.apache.org/jira/browse/HBASE-10701
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: hbase-10070
>
>         Attachments: hbase-10701_v1.patch, hbase-10701_v2.patch, 
> hbase-10701_v3.patch
>
>
> Running the integration test in HBASE-10572, and HBASE-10355, it seems that 
> we need some changes for cache invalidation of meta entries from the client 
> side in backup RPCs. 
> Mainly the RPC's made for replicas should not invalidate the cache for all 
> the replicas (for example on RegionMovedException, connection error etc). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to