[jira] [Updated] (PHOENIX-5313) All mappers grab all RegionLocations from .META

Geoffrey Jacoby (JIRA) Fri, 31 May 2019 16:47:35 -0700


     [ 
https://issues.apache.org/jira/browse/PHOENIX-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Geoffrey Jacoby updated PHOENIX-5313:
-------------------------------------
    Description: 
Phoenix's MapReduce integration lives in PhoenixInputFormat. It implements 
getSplits by calculating a QueryPlan for the provided SELECT query, and each 
split gets a mapper. As part of this QueryPlan generation, we grab all 
RegionLocations from .META

In PhoenixInputFormat:getQueryPlan: 
{code:java}
 // Initialize the query plan so it sets up the parallel scans
 queryPlan.iterator(MapReduceParallelScanGrouper.getInstance());
{code}

In MapReduceParallelScanGrouper.getRegionBoundaries()
{code:java}
return context.getConnection().getQueryServices().getAllTableRegions(tableName);
{code}

This is fine.

Unfortunately, each mapper Task spawned by the job will go through this _same_ 
exercise. It will pass a MapReduceParallelScanGrouper to queryPlan.iterator(), 
which I believe is eventually causing getRegionBoundaries to get called when 
the scans are initialized in the result iterator.

Since HBase 1.x and up got rid of .META prefetching and caching within the 
HBase client, that means that not only will each _Job_ make potentially 
thousands of calls to .META, potentially thousands of _Tasks_ will each make 
potentially thousands of calls to .META. 

We should get a QueryPlan and setup the scans without having to read all 
RegionLocations, either by using the mapper's internal knowledge of its split 
key range, or by serializing the query plan from the client and sending it to 
the mapper tasks for use there. 

Note that MapReduce tasks over snapshots are not affected by this, because 
region locations are stored in the snapshot manifest. 

  was:
Phoenix's MapReduce integration lives in PhoenixInputFormat. It implements 
getSplits by calculating a QueryPlan for the provided SELECT query, and each 
split gets a mapper. As part of this QueryPlan generation, we grab all 
RegionLocations from .META

In PhoenixInputFormat:getQueryPlan: 
{code:java}
 // Initialize the query plan so it sets up the parallel scans
 queryPlan.iterator(MapReduceParallelScanGrouper.getInstance());
{code}

In MapReduceParallelScanGrouper.getRegionBoundaries()
{code:java}
return context.getConnection().getQueryServices().getAllTableRegions(tableName);
{code}

This is fine.

Unfortunately, each mapper Task spawned by the job will go through this _same_ 
exercise when trying to create the RecordReader. Since HBase 1.x and up got rid 
of .META prefetching and caching within the HBase client, that means that not 
only will each _Job_ make potentially thousands of calls to .META, potentially 
thousands of _Tasks_ will do the same. 

The createRecordReader should get a QueryPlan without having to read all 
RegionLocations, either by using its internal knowledge of its split key range, 
or by serializing the query plan from the client and sending it to the mapper 
tasks for use there. 

Note that MapReduce tasks over snapshots are not affected by this, because 
region locations are stored in the snapshot manifest. 


> All mappers grab all RegionLocations from .META
> -----------------------------------------------
>
>                 Key: PHOENIX-5313
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5313
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Geoffrey Jacoby
>            Priority: Major
>
> Phoenix's MapReduce integration lives in PhoenixInputFormat. It implements 
> getSplits by calculating a QueryPlan for the provided SELECT query, and each 
> split gets a mapper. As part of this QueryPlan generation, we grab all 
> RegionLocations from .META
> In PhoenixInputFormat:getQueryPlan: 
> {code:java}
>  // Initialize the query plan so it sets up the parallel scans
>  queryPlan.iterator(MapReduceParallelScanGrouper.getInstance());
> {code}
> In MapReduceParallelScanGrouper.getRegionBoundaries()
> {code:java}
> return 
> context.getConnection().getQueryServices().getAllTableRegions(tableName);
> {code}
> This is fine.
> Unfortunately, each mapper Task spawned by the job will go through this 
> _same_ exercise. It will pass a MapReduceParallelScanGrouper to 
> queryPlan.iterator(), which I believe is eventually causing 
> getRegionBoundaries to get called when the scans are initialized in the 
> result iterator.
> Since HBase 1.x and up got rid of .META prefetching and caching within the 
> HBase client, that means that not only will each _Job_ make potentially 
> thousands of calls to .META, potentially thousands of _Tasks_ will each make 
> potentially thousands of calls to .META. 
> We should get a QueryPlan and setup the scans without having to read all 
> RegionLocations, either by using the mapper's internal knowledge of its split 
> key range, or by serializing the query plan from the client and sending it to 
> the mapper tasks for use there. 
> Note that MapReduce tasks over snapshots are not affected by this, because 
> region locations are stored in the snapshot manifest. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (PHOENIX-5313) All mappers grab all RegionLocations from .META

Reply via email to