[ 
https://issues.apache.org/jira/browse/HBASE-10785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-10785:
----------------------------------

    Attachment: hbase-10785_v3.patch

Attaching v3 patch which removes the conf param, and cache clear on 
useCache==false. 

bq. Is there code duplication in locateMeta? If so, does there have to be (no 
biggie.. just asking).
There is some between locateMeta and locateRegionInMeta, but adding this logic 
in locateRegionInMeta would make it even more complex. I think it should be 
fine. 
bq. I know it's a copy paste; but I don't think we should do that: often the 
second try is w/o cache to be sure, but trashing the cache for the others is 
bad, as the default for a second try is nearly always to bypass the cache
I think we can go either ways. The nice part about removing from cache is that 
one thread already knows that the location that is cached is no good, so it 
just removes it so that other threads will wait for this to finish the lookup 
of the new location. On some cases, this will save unnecessary trips to the bad 
location (and possible socket timeouts), while on other cases, a retry from a 
thread will stall the other lookup. v3 patch removes this cache invalidation 
though. 

> Metas own location should be cached
> -----------------------------------
>
>                 Key: HBASE-10785
>                 URL: https://issues.apache.org/jira/browse/HBASE-10785
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: hbase-10070
>
>         Attachments: hbase-10785_v1.patch, hbase-10785_v2.patch, 
> hbase-10785_v3.patch
>
>
> With ROOT table gone, we no longer cache the location of the meta table (in 
> MetaCache) in 96+. I've checked 94 code, and there we cache meta, but not 
> root.
> However, not caching the metas own location means that we are doing a 
> zookeeper request every time we want to look up a regions location from meta. 
> This means that there is a significant spike in zk requests whenever a region 
> server goes down. 
> This affects trunk,0.98 and 0.96 as well as hbase-10070 branch. I've 
> discovered the issue in hbase-10070 because of the integration test 
> (HBASE-10572) results in 150K requests to zk in 10min. 
> A thread dump from one of the runs have 100+ threads from client in this 
> stack trace: 
>       {code}
>       "TimeBoundedMultiThreadedReaderThread_20" prio=10 
> tid=0x00007f852c2f2000 nid=0x57b6 in Object.wait() [0x00007f85059e7000]
>          java.lang.Thread.State: WAITING (on object monitor)
>               at java.lang.Object.wait(Native Method)
>               at java.lang.Object.wait(Object.java:503)
>               at 
> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
>               - locked <0x00000000ea71aa78> (a 
> org.apache.zookeeper.ClientCnxn$Packet)
>               at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1149)
>               at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337)
>               at 
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:684)
>               at 
> org.apache.hadoop.hbase.zookeeper.ZKUtil.blockUntilAvailable(ZKUtil.java:1853)
>               at 
> org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.blockUntilAvailable(MetaRegionTracker.java:186)
>               at 
> org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:60)
>               at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1126)
>               at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1112)
>               at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1220)
>               at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1129)
>               at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:321)
>               at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.call(RpcRetryingCallerWithReadReplicas.java:257)
>               - locked <0x00000000e9bcf238> (a 
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas)
>               at org.apache.hadoop.hbase.client.HTable.get(HTable.java:818)
>               at 
> org.apache.hadoop.hbase.util.MultiThreadedReader$HBaseReaderThread.queryKey(MultiThreadedReader.java:288)
>               at 
> org.apache.hadoop.hbase.util.MultiThreadedReader$HBaseReaderThread.readKey(MultiThreadedReader.java:249)
>               at 
> org.apache.hadoop.hbase.util.MultiThreadedReader$HBaseReaderThread.runReader(MultiThreadedReader.java:192)
>               at 
> org.apache.hadoop.hbase.util.MultiThreadedReader$HBaseReaderThread.run(MultiThreadedReader.java:150)
>       {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to