ctubbsii commented on a change in pull request #2117:
URL: https://github.com/apache/accumulo/pull/2117#discussion_r657317206



##########
File path: 
server/tserver/src/main/java/org/apache/accumulo/tserver/logger/LogFileKey.java
##########
@@ -189,4 +198,162 @@ public String toString() {
     }
     throw new RuntimeException("Unknown type of entry: " + event);
   }
+
+  /**
+   * Converts LogFileKey to Key. Creates a Key containing all of the 
LogFileKey fields. The fields
+   * are stored so the Key sorts maintaining the legacy sort order. The row of 
the Key is composed
+   * of 3 fields: EventNum + tabletID + seq. The EventNum is the byte returned 
by eventType(). The
+   * column family is always the event. The column qualifier is dependent of 
the type of event and
+   * could be empty.
+   *
+   * <pre>
+   *     Key Schema:
+   *     Row = EventNum + tabletID + seq
+   *     Family = event
+   *     Qualifier = tserverSession OR filename OR KeyExtent
+   * </pre>
+   */
+  public Key toKey() throws IOException {
+    byte[] formattedRow;
+    String family = event.name();
+    var kb = Key.builder();
+    switch (event) {
+      case OPEN:
+        formattedRow = formatRow(0, 0);
+        return 
kb.row(formattedRow).family(family).qualifier(tserverSession).build();
+      case COMPACTION_START:
+        formattedRow = formatRow(tabletId, seq);
+        return kb.row(formattedRow).family(family).qualifier(filename).build();
+      case MUTATION:
+      case MANY_MUTATIONS:
+      case COMPACTION_FINISH:
+        return kb.row(formatRow(tabletId, seq)).family(family).build();
+      case DEFINE_TABLET:
+        formattedRow = formatRow(tabletId, seq);
+        DataOutputBuffer buffer = new DataOutputBuffer();
+        tablet.writeTo(buffer);
+        var q = copyOf(buffer.getData(), buffer.getLength());
+        buffer.close();
+        return kb.row(formattedRow).family(family).qualifier(q).build();
+      default:
+        throw new AssertionError("Invalid event type in LogFileKey: " + event);
+    }
+  }
+
+  /**
+   * Get the first byte for the event. The only possible values are 0-4. This 
is used as the highest
+   * byte in the row.
+   */
+  private byte getEventByte() {
+    int evenTypeInteger = eventType(event);
+    return (byte) (evenTypeInteger & 0xff);
+  }
+
+  /**
+   * Get the byte encoded row for this LogFileKey as a Text object.
+   */
+  private Text formatRow() {
+    return new Text(formatRow(tabletId, seq));
+  }
+
+  /**
+   * Format the row using 13 bytes encoded to allow proper sorting of the 
RFile Key. The highest
+   * byte is for the event number, 4 bytes for the tabletId and 8 bytes for 
the sequence long.
+   */
+  private byte[] formatRow(int tabletId, long seq) {
+    byte eventNum = getEventByte();
+    // These will not sort properly when encoded if negative. Negative is not 
expected currently,
+    // defending against future changes and/or bugs.
+    Preconditions.checkArgument(eventNum >= 0 && seq >= 0);
+    byte[] row = new byte[13];
+    // encode the signed integer so negatives will sort properly for tabletId
+    int encodedTabletId = tabletId ^ 0x80000000;
+
+    row[0] = eventNum;
+    row[1] = (byte) ((encodedTabletId >>> 24) & 0xff);
+    row[2] = (byte) ((encodedTabletId >>> 16) & 0xff);
+    row[3] = (byte) ((encodedTabletId >>> 8) & 0xff);
+    row[4] = (byte) (encodedTabletId & 0xff);
+    row[5] = (byte) (seq >>> 56);
+    row[6] = (byte) (seq >>> 48);
+    row[7] = (byte) (seq >>> 40);
+    row[8] = (byte) (seq >>> 32);
+    row[9] = (byte) (seq >>> 24);
+    row[10] = (byte) (seq >>> 16);
+    row[11] = (byte) (seq >>> 8);
+    row[12] = (byte) (seq); // >>> 0
+    return row;
+  }
+
+  /**
+   * Extract the tabletId integer from the byte encoded Row.
+   */
+  private static int getTabletId(byte[] row) {
+    int encoded = ((row[1] << 24) + (row[2] << 16) + (row[3] << 8) + row[4]);

Review comment:
       You can get away with `+` in DataInputStream's implementation, because 
it uses `int` the whole time. So, there's no integer type promotion from `byte` 
to `int` going on before the `+`... they are already guaranteed to be unsigned 
`int`s. That doesn't work if you are using `byte` types, which will be 
automatically sign-extended when they are promoted to `int` for the addition 
operation, thereby changing their value. This doesn't matter for the first byte 
that gets shifted to the far left, because shifting all the way removes the 
effects of sign extension. If we properly truncate any sign-extension from the 
promotion to `int`, we can probably use `+`, but I think it's more intuitive to 
use `|`, and I don't think it really matters, except to communicate to other 
developers that we're doing bit-wise stuff, and not arithmetic stuff.
   
   In any case, we've fixed this in #2176 and #2177, but I just wanted to 
comment with the explanation for the archives.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to