keith-turner commented on code in PR #3640:
URL: https://github.com/apache/accumulo/pull/3640#discussion_r1276942072


##########
server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java:
##########
@@ -789,8 +795,66 @@ private void mergeMetadataRecords(MergeInfo info) throws 
AccumuloException {
       for (Entry<Key,Value> entry : scanner) {
         Key key = entry.getKey();
         Value value = entry.getValue();
+
+        final KeyExtent keyExtent = KeyExtent.fromMetaRow(key.getRow());
+
+        // Keep track of the last Key Extent seen so we can use it to fence
+        // of RFiles when merging the metadata
+        if (lastRow != null && !keyExtent.equals(lastRow)) {
+          previousKeyExtent = lastRow;
+        }
+
+        // Special case for now to handle the highest/stop tablet which is 
where files are
+        // merged to so we need to handle the deletes on update here as it 
won't be handled later
+        // TODO: Can this be re-written to not have a special edge case and 
make it simpler?
+        if (keyExtent.equals(stopExtent)) {
+          if (previousKeyExtent != null
+              && key.getColumnFamily().equals(DataFileColumnFamily.NAME)) {
+
+            // Delete exiting metadata as we are now adding a range
+            m.putDelete(key.getColumnFamily(), key.getColumnQualifier());
+
+            // Fence off existing files by the end row of the previous tablet 
and current tablet
+            final StoredTabletFile existing = 
StoredTabletFile.of(key.getColumnQualifier());
+            // The end row should be inclusive for the current tablet and the 
previous end row
+            // should be exclusive for the start row
+            Range fenced = new Range(previousKeyExtent.endRow(), false, 
keyExtent.endRow(), true);
+
+            // TODO If there is an existing range we likely need to clip the 
range with something
+            // like the following which is commented out for now. We also may 
need to handle
+            // disjoint ranges
+            // fenced = existing.hasRange() ? fenced.clip(existing.getRange()) 
: fenced;

Review Comment:
   Wondering if the following would work here.  If the existing range is 
infinite, then fenced should pass through.  Depending on how the split code 
works, its possible a file could have a range that extends outside of a tablet, 
I think the following code handles that case along with the case where the 
range was completely inside the tablet.
   
   ```suggestion
             // If the ranges are disjoint this will throw an exception. It's 
not expected that a
             // tablet would contain a file with a range that was completely 
outside the tablet.
             // Neither split nor merge should create this situation.  Split 
should only assign
             // files to child tablets that overlap the files range.
             fenced = existing.getRange().clip(fenced);
   ```



##########
server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java:
##########
@@ -789,8 +795,66 @@ private void mergeMetadataRecords(MergeInfo info) throws 
AccumuloException {
       for (Entry<Key,Value> entry : scanner) {
         Key key = entry.getKey();
         Value value = entry.getValue();
+
+        final KeyExtent keyExtent = KeyExtent.fromMetaRow(key.getRow());
+
+        // Keep track of the last Key Extent seen so we can use it to fence
+        // of RFiles when merging the metadata
+        if (lastRow != null && !keyExtent.equals(lastRow)) {
+          previousKeyExtent = lastRow;
+        }
+
+        // Special case for now to handle the highest/stop tablet which is 
where files are
+        // merged to so we need to handle the deletes on update here as it 
won't be handled later
+        // TODO: Can this be re-written to not have a special edge case and 
make it simpler?
+        if (keyExtent.equals(stopExtent)) {
+          if (previousKeyExtent != null
+              && key.getColumnFamily().equals(DataFileColumnFamily.NAME)) {
+
+            // Delete exiting metadata as we are now adding a range
+            m.putDelete(key.getColumnFamily(), key.getColumnQualifier());

Review Comment:
   May not want to always do this delete.   If the existing files range falls 
completely within this new tablets range then its range will not change.  If 
its range does not change then will its new and old metadata entries be 
identical?  If so maybe we only bother with the delete and add if the computed 
fenced range is not equal to the files existing range.
   
   
   
   



##########
test/src/main/java/org/apache/accumulo/test/functional/MergeIT.java:
##########
@@ -102,6 +119,146 @@ public void mergeSize() throws Exception {
     }
   }
 
+  @Test
+  public void noChopMergeTest() throws Exception {
+    noChopMergeTest(false);
+  }
+
+  @Test
+  public void noChopMergeTestCompactTrue() throws Exception {
+    noChopMergeTest(true);
+  }
+
+  private void noChopMergeTest(boolean compact) throws Exception {
+    try (AccumuloClient c = 
Accumulo.newClient().from(getClientProps()).build()) {
+      String tableName = getUniqueNames(1)[0];
+      c.tableOperations().create(tableName);
+      final TableId tableId = 
TableId.of(c.tableOperations().tableIdMap().get(tableName));
+
+      // First write 1000 rows to a file in the default tablet
+      ingest(c, 1000, 1, tableName);
+      c.tableOperations().flush(tableName, null, null, true);
+
+      System.out.println("\nMetadata after Ingest");
+      printAndVerifyFileMetadata(c, tableId, 1);
+
+      // Add splits so we end up with 4 tablets
+      final SortedSet<Text> splits = new TreeSet<>();
+      for (int i = 250; i <= 750; i += 250) {
+        splits.add(new Text("row_" + String.format("%010d", i)));
+      }
+      c.tableOperations().addSplits(tableName, splits);
+
+      System.out.println("Metadata after Split");
+      verify(c, 1000, 1, tableName);
+      printAndVerifyFileMetadata(c, tableId, 4);
+
+      // Go through and delete two blocks of rows, 101 - 200
+      // and also 301 - 400 so we can test that the data doesn't come
+      // back on merge
+      try (BatchWriter bw = c.createBatchWriter(tableName)) {
+        byte[] COL_PREFIX = "col_".getBytes(UTF_8);
+        Text colq = new Text(FastFormat.toZeroPaddedString(0, 7, 10, 
COL_PREFIX));
+
+        for (int i = 101; i <= 200; i++) {
+          Mutation m = new Mutation(new Text("row_" + String.format("%010d", 
i)));
+          m.putDelete(new Text("colf"), colq);
+          bw.addMutation(m);
+        }
+        for (int i = 301; i <= 400; i++) {
+          Mutation m = new Mutation(new Text("row_" + String.format("%010d", 
i)));
+          m.putDelete(new Text("colf"), colq);
+          bw.addMutation(m);
+        }
+      }
+
+      System.out.println("Metadata after deleting rows 101 - 200 and 301 - 
400");
+      c.tableOperations().compact(tableName,
+          new CompactionConfig().setEndRow(splits.first()).setWait(true));
+      c.tableOperations().flush(tableName, null, null, true);
+      printAndVerifyFileMetadata(c, tableId, 5);
+
+      // Optionally compact before merge
+      if (compact) {
+        c.tableOperations().compact(tableName, new 
CompactionConfig().setWait(true));
+      }

Review Comment:
   What is the reason behind this optional compaction?



##########
server/base/src/main/java/org/apache/accumulo/server/manager/state/MergeInfo.java:
##########
@@ -82,16 +82,20 @@ public boolean isDelete() {
   }
 
   public boolean needsToBeChopped(KeyExtent otherExtent) {
+    // TODO Clean this up, for now just commenting out and returning false here
+    // was a simple/quick way to disable chop compactions
+
     // During a delete, the block after the merge will be stretched to cover 
the deleted area.
     // Therefore, it needs to be chopped
-    if (!otherExtent.tableId().equals(extent.tableId())) {
-      return false;
-    }
-    if (isDelete()) {
-      return otherExtent.prevEndRow() != null && 
otherExtent.prevEndRow().equals(extent.endRow());
-    } else {
-      return this.extent.overlaps(otherExtent);
-    }
+    // if (!otherExtent.tableId().equals(extent.tableId())) {
+    // return false;
+    // }
+    // if (isDelete()) {
+    // return otherExtent.prevEndRow() != null && 
otherExtent.prevEndRow().equals(extent.endRow());
+    // } else {
+    // return this.extent.overlaps(otherExtent);
+    // }
+    return false;

Review Comment:
   This is great.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to