[jira] [Commented] (PHOENIX-180) Use stats to guide query parallelization

ramkrishna.s.vasudevan (JIRA) Fri, 19 Sep 2014 06:46:40 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140485#comment-14140485
 ]


ramkrishna.s.vasudevan commented on PHOENIX-180:
------------------------------------------------

We are sometimes getting 
{code}
org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in hbase:meta 
for table: SYSTEM.STATS, row=SYSTEM.STATS,ATABLE,99999999999999
        at 
org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:146)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1159)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1223)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1111)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1068)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:909)
        at 
org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:72)
        at 
org.apache.hadoop.hbase.client.ScannerCallable.prepare(ScannerCallable.java:125)
        at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:113)
        at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
        at 
org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:282)
        at 
org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:187)
        at 
org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:182)
        at 
org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:109)
        at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:738)
        at 
org.apache.phoenix.coprocessor.MetaDataEndpointImpl.updateStatsInternal(MetaDataEndpointImpl.java:709)
        at 
org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:682)
        at 
org.apache.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(MetaDataEndpointImpl.java:436)
        at 
org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:415)
        at 
org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:346)
        at 
org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:7766)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:5639)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.execServiceOnRegion(HRegionServer.java:3321)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.execService(HRegionServer.java:3303)
        at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29501)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
        at java.lang.Thread.run(Thread.java:722)
2014-09-19 18:56:29,969 WARN  [defaultRpcServer.handler=0,queue=0,port=62189] 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation(1165):
 Encountered problems when prefetch hbase:meta table: 
org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in hbase:meta 
for table: SYSTEM.STATS, row=SYSTEM.STATS,ATABLE,99999999999999
        at 
org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:146)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1159)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1223)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1111)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1068)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:909)
        at 
org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:72)
        at 
org.apache.hadoop.hbase.client.ScannerCallable.prepare(ScannerCallable.java:125)
        at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:113)
        at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
        at 
org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:449)
        at 
org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:288)
        at 
org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:187)
        at 
org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:182)
        at 
org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:109)
        at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:738)
        at 
org.apache.phoenix.coprocessor.MetaDataEndpointImpl.updateStatsInternal(MetaDataEndpointImpl.java:709)
        at 
org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:682)
        at 
org.apache.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(MetaDataEndpointImpl.java:436)
        at 
org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:415)
        at 
org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:346)
        at 
org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:7766)
{code}
This is fine expected. But there are also logs 
{code}
Deleted SYSTEM.STATS
{code}
After that there is no creation of SYSTEM.STATS. Is there any way to avoid 
deleting this SYSTEM.STATS table? Because of this the IT tests fail.  If not 
would pass because individually they pass.

> Use stats to guide query parallelization
> ----------------------------------------
>
>                 Key: PHOENIX-180
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-180
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: ramkrishna.s.vasudevan
>              Labels: enhancement
>         Attachments: Phoenix-180_V1.patch, Phoenix-180_V2.patch, 
> Phoenix-180_WIP.patch
>
>
> We're currently not using stats, beyond a table-wide min key/max key cached 
> per client connection, to guide parallelization. If a query targets just a 
> few regions, we don't know how to evenly divide the work among threads, 
> because we don't know the data distribution. This other [issue] 
> (https://github.com/forcedotcom/phoenix/issues/64) is targeting gather and 
> maintaining the stats, while this issue is focused on using the stats.
> The main changes are:
> 1. Create a PTableStats interface that encapsulates the stats information 
> (and implements the Writable interface so that it can be serialized back from 
> the server).
> 2. Add a stats member variable off of PTable to hold this.
> 3. From MetaDataEndPointImpl, lookup the stats row for the table in the stats 
> table. If the stats have changed, return a new PTable with the updated stats 
> information. We may want to cache the stats row and have the stats gatherer 
> invalidate the cache row when updated so we don't have to always do a scan 
> for it. Additionally, it would be idea if we could use the same split policy 
> on the stats table that we use on the system table to guarantee co-location 
> of data (for the sake of caching).
> - modify the client-side parallelization (ParallelIterators.getSplits()) to 
> use this information to guide how to chunk up the scans at query time.
> This should help boost query performance, especially in cases where the data 
> is highly skewed. It's likely the cause for the slowness reported in this 
> issue: https://github.com/forcedotcom/phoenix/issues/47.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-180) Use stats to guide query parallelization

Reply via email to