keith-turner commented on code in PR #5833:
URL: https://github.com/apache/accumulo/pull/5833#discussion_r2307815668


##########
server/manager/src/main/java/org/apache/accumulo/manager/tableOps/split/UpdateTablets.java:
##########
@@ -125,56 +128,69 @@ public Repo<Manager> call(FateId fateId, Manager manager) 
throws Exception {
    */
   static Map<KeyExtent,Map<StoredTabletFile,DataFileValue>> 
getNewTabletFiles(FateId fateId,
       SortedMap<KeyExtent,TabletMergeability> newTablets, TabletMetadata 
tabletMetadata,
-      Function<StoredTabletFile,Splitter.FileInfo> fileInfoProvider) {
+      Function<StoredTabletFile,FileSKVIterator.FileRange> fileInfoProvider) {
 
     Map<KeyExtent,Map<StoredTabletFile,DataFileValue>> tabletsFiles = new 
TreeMap<>();
 
     newTablets.keySet().forEach(extent -> tabletsFiles.put(extent, new 
HashMap<>()));
 
     // determine which files overlap which tablets and their estimated sizes
     tabletMetadata.getFilesMap().forEach((file, dataFileValue) -> {
-      Splitter.FileInfo fileInfo = fileInfoProvider.apply(file);
+      FileSKVIterator.FileRange fileRange = fileInfoProvider.apply(file);
+
+      // This predicate is used to determine if a given tablet range overlaps 
the data in this file.
+      Predicate<Range> overlapPredicate;
+      if (fileRange == null) {
+        // The range is not known, so assume all tablets overlap the file
+        overlapPredicate = r -> true;
+      } else if (fileRange.empty) {
+        // The file is empty so not tablets can overlap it
+        overlapPredicate = r -> false;
+      } else {
+        // check if the tablet range overlaps the file range
+        overlapPredicate = range -> range.clip(fileRange.rowRange, true) != 
null;
 
-      Range fileRange;
-      if (fileInfo != null) {
-        fileRange = new Range(fileInfo.getFirstRow(), fileInfo.getLastRow());
         if (!file.getRange().isInfiniteStartKey() || 
!file.getRange().isInfiniteStopKey()) {
           // Its expected that if a file has a range that the first row and 
last row will be clipped
           // to be within that range. For that reason this code does not check 
file.getRange() when
           // making decisions about whether a file should go to a tablet, 
because its assumed that
           // fileRange will cover that case. Since file.getRange() is not 
being checked directly
           // this code validates the assumption that fileRange is within 
file.getRange()
+
           Preconditions.checkState(
-              file.getRange().clip(new Range(fileInfo.getFirstRow()), false) 
!= null,
-              "First row %s computed for file %s did not fall in its range", 
fileInfo.getFirstRow(),
-              file);
+              file.getRange().clip(new 
Range(fileRange.rowRange.getStartKey().getRow()), false)
+                  != null,
+              "First row %s computed for file %s did not fall in its range",
+              fileRange.rowRange.getStartKey().getRow(), file);
+
+          var lastRow = 
RowRangeUtil.stripZeroTail(fileRange.rowRange.getEndKey().getRowData());
           Preconditions.checkState(
-              file.getRange().clip(new Range(fileInfo.getLastRow()), false) != 
null,
-              "Last row %s computed for file %s did not fall in its range", 
fileInfo.getLastRow(),
-              file);
+              file.getRange().clip(new Range(new Text(lastRow.toArray())), 
false) != null,
+              "Last row %s computed for file %s did not fall in its range", 
lastRow, file);
         }
-      } else {
-        fileRange = new Range();
       }
 
       // count how many of the new tablets the file will overlap
-      double numOverlapping = 
newTablets.keySet().stream().map(KeyExtent::toDataRange)
-          .filter(range -> range.clip(fileRange, true) != null).count();
-
-      Preconditions.checkState(numOverlapping > 0);

Review Comment:
   This was one of the two bugs fixed. This code would throw an exception when 
a file overlapped zero child tablets during tablet split.  However there are 
legitimate reasons this can happen.   The new code logs a debug that the file 
does not overlap any tablets and drops the file.
   
   Looked back at what 2.1 code did for this case.   2.1 code only splits a 
tablets into two tablets.  In main it splits into 2 or more tablets in a single 
operation.  The 2.1 code always assigned a file to one tablet or both when 
splitting, it never dropped a  file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to