keith-turner commented on code in PR #5833: URL: https://github.com/apache/accumulo/pull/5833#discussion_r2307815668
########## server/manager/src/main/java/org/apache/accumulo/manager/tableOps/split/UpdateTablets.java: ########## @@ -125,56 +128,69 @@ public Repo<Manager> call(FateId fateId, Manager manager) throws Exception { */ static Map<KeyExtent,Map<StoredTabletFile,DataFileValue>> getNewTabletFiles(FateId fateId, SortedMap<KeyExtent,TabletMergeability> newTablets, TabletMetadata tabletMetadata, - Function<StoredTabletFile,Splitter.FileInfo> fileInfoProvider) { + Function<StoredTabletFile,FileSKVIterator.FileRange> fileInfoProvider) { Map<KeyExtent,Map<StoredTabletFile,DataFileValue>> tabletsFiles = new TreeMap<>(); newTablets.keySet().forEach(extent -> tabletsFiles.put(extent, new HashMap<>())); // determine which files overlap which tablets and their estimated sizes tabletMetadata.getFilesMap().forEach((file, dataFileValue) -> { - Splitter.FileInfo fileInfo = fileInfoProvider.apply(file); + FileSKVIterator.FileRange fileRange = fileInfoProvider.apply(file); + + // This predicate is used to determine if a given tablet range overlaps the data in this file. + Predicate<Range> overlapPredicate; + if (fileRange == null) { + // The range is not known, so assume all tablets overlap the file + overlapPredicate = r -> true; + } else if (fileRange.empty) { + // The file is empty so not tablets can overlap it + overlapPredicate = r -> false; + } else { + // check if the tablet range overlaps the file range + overlapPredicate = range -> range.clip(fileRange.rowRange, true) != null; - Range fileRange; - if (fileInfo != null) { - fileRange = new Range(fileInfo.getFirstRow(), fileInfo.getLastRow()); if (!file.getRange().isInfiniteStartKey() || !file.getRange().isInfiniteStopKey()) { // Its expected that if a file has a range that the first row and last row will be clipped // to be within that range. For that reason this code does not check file.getRange() when // making decisions about whether a file should go to a tablet, because its assumed that // fileRange will cover that case. Since file.getRange() is not being checked directly // this code validates the assumption that fileRange is within file.getRange() + Preconditions.checkState( - file.getRange().clip(new Range(fileInfo.getFirstRow()), false) != null, - "First row %s computed for file %s did not fall in its range", fileInfo.getFirstRow(), - file); + file.getRange().clip(new Range(fileRange.rowRange.getStartKey().getRow()), false) + != null, + "First row %s computed for file %s did not fall in its range", + fileRange.rowRange.getStartKey().getRow(), file); + + var lastRow = RowRangeUtil.stripZeroTail(fileRange.rowRange.getEndKey().getRowData()); Preconditions.checkState( - file.getRange().clip(new Range(fileInfo.getLastRow()), false) != null, - "Last row %s computed for file %s did not fall in its range", fileInfo.getLastRow(), - file); + file.getRange().clip(new Range(new Text(lastRow.toArray())), false) != null, + "Last row %s computed for file %s did not fall in its range", lastRow, file); } - } else { - fileRange = new Range(); } // count how many of the new tablets the file will overlap - double numOverlapping = newTablets.keySet().stream().map(KeyExtent::toDataRange) - .filter(range -> range.clip(fileRange, true) != null).count(); - - Preconditions.checkState(numOverlapping > 0); Review Comment: This was one of the two bugs fixed. This code would throw an exception when a file overlapped zero child tablets during tablet split. However there are legitimate reasons this can happen. The new code logs a debug that the file does not overlap any tablets and drops the file. Looked back at what 2.1 code did for this case. 2.1 code only splits a tablets into two tablets. In main it splits into 2 or more tablets in a single operation. The 2.1 code always assigned a file to one tablet or both when splitting, it never dropped a file. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org