Jingyun Tian created HBASE-20769:
------------------------------------

             Summary: getSplits() has a out of bounds problem in 
TableSnapshotInputFormatImpl
                 Key: HBASE-20769
                 URL: https://issues.apache.org/jira/browse/HBASE-20769
             Project: HBase
          Issue Type: Bug
    Affects Versions: 2.0.0, 1.4.0, 1.3.0
            Reporter: Jingyun Tian
            Assignee: Jingyun Tian
             Fix For: 2.0.0


When numSplits > 1, getSplits may create split that has start row smaller than 
user specified scan's start row or stop row larger than user specified scan's 
stop row.

{code}

        byte[][] sp = sa.split(hri.getStartKey(), hri.getEndKey(), numSplits, 
true);
        for (int i = 0; i < sp.length - 1; i++) {
          if (PrivateCellUtil.overlappingKeys(scan.getStartRow(), 
scan.getStopRow(), sp[i],
                  sp[i + 1])) {
            List<String> hosts =
                calculateLocationsForInputSplit(conf, htd, hri, tableDir, 
localityEnabled);

            Scan boundedScan = new Scan(scan);
            boundedScan.setStartRow(sp[i]);
            boundedScan.setStopRow(sp[i + 1]);

            splits.add(new InputSplit(htd, hri, hosts, boundedScan, 
restoreDir));
          }
        }

{code}

Since we split keys by the range of regions, when sp[i] < scan.getStartRow() or 
sp[i + 1] > scan.getStopRow(), the created bounded scan may contain range that 
over user defined scan.

fix should be simple:

{code}

boundedScan.setStartRow(
 Bytes.compareTo(scan.getStartRow(), sp[i]) > 0 ? scan.getStartRow() : sp[i]);
 boundedScan.setStopRow(
 Bytes.compareTo(scan.getStopRow(), sp[i + 1]) < 0 ? scan.getStopRow() : sp[i + 
1]);

{code}

I will also try to add UTs to help discover this problem



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to