[ 
https://issues.apache.org/jira/browse/PHOENIX-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667612#comment-16667612
 ] 

ASF GitHub Bot commented on PHOENIX-4997:
-----------------------------------------

Github user twdsilva commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/397#discussion_r229055918
  
    --- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/iterate/MapReduceParallelScanGrouper.java
 ---
    @@ -80,18 +83,39 @@ public boolean shouldStartNewScan(QueryPlan plan, 
List<Scan> scans,
                }
        }
     
    +   /**
    +    * Get list of region locations from SnapshotManifest
    +    * BaseResultIterators assume that regions are sorted using 
RegionInfo.COMPARATOR
    +    */
        private List<HRegionLocation> 
getRegionLocationsFromManifest(SnapshotManifest manifest) {
                List<SnapshotRegionManifest> regionManifests = 
manifest.getRegionManifests();
                Preconditions.checkNotNull(regionManifests);
     
    -           List<HRegionLocation> regionLocations = 
Lists.newArrayListWithCapacity(regionManifests.size());
    +           List<RegionInfo> regionInfos = 
Lists.newArrayListWithCapacity(regionManifests.size());
    +           List<HRegionLocation> hRegionLocations = 
Lists.newArrayListWithCapacity(regionManifests.size());
     
                for (SnapshotRegionManifest regionManifest : regionManifests) {
    -                   regionLocations.add(new HRegionLocation(
    -                                   
ProtobufUtil.toRegionInfo(regionManifest.getRegionInfo()), null));
    +                   RegionInfo regionInfo = 
ProtobufUtil.toRegionInfo(regionManifest.getRegionInfo());
    +                   if (isValidRegion(regionInfo)) {
    +                           regionInfos.add(regionInfo);
    +                   }
    +           }
    +
    +           regionInfos.sort(RegionInfo.COMPARATOR);
    +
    +           for (RegionInfo regionInfo : regionInfos) {
    +                   hRegionLocations.add(new HRegionLocation(regionInfo, 
null));
                }
     
    -           return regionLocations;
    +           return hRegionLocations;
    +   }
    +
    +   // Exclude offline split parent regions
    +   private boolean isValidRegion(RegionInfo hri) {
    --- End diff --
    
    Maybe extract this to a util since its used in two classes.


> Phoenix MR on snapshots can produce duplicate rows
> --------------------------------------------------
>
>                 Key: PHOENIX-4997
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4997
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Karan Mehta
>            Assignee: Karan Mehta
>            Priority: Major
>         Attachments: PHOENIX-4997.master.001.patch
>
>
> Phoenix MR over snapshots uses TableSnapshotResultIterator and 
> SnapshotScanner classes for iterating/scanning over snapshots. They had been 
> copied over from HBase classes TableSnapshotScanner and 
> ClientSideRegionScanner classes and modified according to Phoenix 
> requirements. This decision was taken since some of fields of these classes 
> were private and hence it is not possible to reuse them. HBASE-8369 is the 
> main Jira.
> The framework had a bug which was fixed as part of HBASE-16011. However the 
> fix was not ported to Phoenix and hence Phoenix MR over snapshots still 
> continues to have it. This Jira is to fix that issue.
> FYI [~akshita.malhotra] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to