[ 
https://issues.apache.org/jira/browse/PHOENIX-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868191#comment-16868191
 ] 

Hadoop QA commented on PHOENIX-5313:
------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12972269/PHOENIX-5313-v1.patch
  against master branch at commit 1f2508dbde365aaedac628c89df237e8b6b46df8.
  ATTACHMENT ID: 12972269

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified tests.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:red}-1 release audit{color}.  The applied patch generated 2 release 
audit warnings (more than the master's current 0 warnings).

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/2689//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/2689//artifact/patchprocess/patchReleaseAuditWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/2689//console

This message is automatically generated.

> All mappers grab all RegionLocations from .META
> -----------------------------------------------
>
>                 Key: PHOENIX-5313
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5313
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Geoffrey Jacoby
>            Assignee: Chinmay Kulkarni
>            Priority: Major
>         Attachments: PHOENIX-5313-v1.patch
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Phoenix's MapReduce integration lives in PhoenixInputFormat. It implements 
> getSplits by calculating a QueryPlan for the provided SELECT query, and each 
> split gets a mapper. As part of this QueryPlan generation, we grab all 
> RegionLocations from .META
> In PhoenixInputFormat:getQueryPlan: 
> {code:java}
>  // Initialize the query plan so it sets up the parallel scans
>  queryPlan.iterator(MapReduceParallelScanGrouper.getInstance());
> {code}
> In MapReduceParallelScanGrouper.getRegionBoundaries()
> {code:java}
> return 
> context.getConnection().getQueryServices().getAllTableRegions(tableName);
> {code}
> This is fine.
> Unfortunately, each mapper Task spawned by the job will go through this 
> _same_ exercise. It will pass a MapReduceParallelScanGrouper to 
> queryPlan.iterator(), which I believe is eventually causing 
> getRegionBoundaries to get called when the scans are initialized in the 
> result iterator.
> Since HBase 1.x and up got rid of .META prefetching and caching within the 
> HBase client, that means that not only will each _Job_ make potentially 
> thousands of calls to .META, potentially thousands of _Tasks_ will each make 
> potentially thousands of calls to .META. 
> We should get a QueryPlan and setup the scans without having to read all 
> RegionLocations, either by using the mapper's internal knowledge of its split 
> key range, or by serializing the query plan from the client and sending it to 
> the mapper tasks for use there. 
> Note that MapReduce tasks over snapshots are not affected by this, because 
> region locations are stored in the snapshot manifest. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to