ctubbsii commented on a change in pull request #2117:
URL: https://github.com/apache/accumulo/pull/2117#discussion_r657317206
##########
File path:
server/tserver/src/main/java/org/apache/accumulo/tserver/logger/LogFileKey.java
##########
@@ -189,4 +198,162 @@ public String toString() {
}
throw new RuntimeException("Unknown type of entry: " + event);
}
+
+ /**
+ * Converts LogFileKey to Key. Creates a Key containing all of the
LogFileKey fields. The fields
+ * are stored so the Key sorts maintaining the legacy sort order. The row of
the Key is composed
+ * of 3 fields: EventNum + tabletID + seq. The EventNum is the byte returned
by eventType(). The
+ * column family is always the event. The column qualifier is dependent of
the type of event and
+ * could be empty.
+ *
+ * <pre>
+ * Key Schema:
+ * Row = EventNum + tabletID + seq
+ * Family = event
+ * Qualifier = tserverSession OR filename OR KeyExtent
+ * </pre>
+ */
+ public Key toKey() throws IOException {
+ byte[] formattedRow;
+ String family = event.name();
+ var kb = Key.builder();
+ switch (event) {
+ case OPEN:
+ formattedRow = formatRow(0, 0);
+ return
kb.row(formattedRow).family(family).qualifier(tserverSession).build();
+ case COMPACTION_START:
+ formattedRow = formatRow(tabletId, seq);
+ return kb.row(formattedRow).family(family).qualifier(filename).build();
+ case MUTATION:
+ case MANY_MUTATIONS:
+ case COMPACTION_FINISH:
+ return kb.row(formatRow(tabletId, seq)).family(family).build();
+ case DEFINE_TABLET:
+ formattedRow = formatRow(tabletId, seq);
+ DataOutputBuffer buffer = new DataOutputBuffer();
+ tablet.writeTo(buffer);
+ var q = copyOf(buffer.getData(), buffer.getLength());
+ buffer.close();
+ return kb.row(formattedRow).family(family).qualifier(q).build();
+ default:
+ throw new AssertionError("Invalid event type in LogFileKey: " + event);
+ }
+ }
+
+ /**
+ * Get the first byte for the event. The only possible values are 0-4. This
is used as the highest
+ * byte in the row.
+ */
+ private byte getEventByte() {
+ int evenTypeInteger = eventType(event);
+ return (byte) (evenTypeInteger & 0xff);
+ }
+
+ /**
+ * Get the byte encoded row for this LogFileKey as a Text object.
+ */
+ private Text formatRow() {
+ return new Text(formatRow(tabletId, seq));
+ }
+
+ /**
+ * Format the row using 13 bytes encoded to allow proper sorting of the
RFile Key. The highest
+ * byte is for the event number, 4 bytes for the tabletId and 8 bytes for
the sequence long.
+ */
+ private byte[] formatRow(int tabletId, long seq) {
+ byte eventNum = getEventByte();
+ // These will not sort properly when encoded if negative. Negative is not
expected currently,
+ // defending against future changes and/or bugs.
+ Preconditions.checkArgument(eventNum >= 0 && seq >= 0);
+ byte[] row = new byte[13];
+ // encode the signed integer so negatives will sort properly for tabletId
+ int encodedTabletId = tabletId ^ 0x80000000;
+
+ row[0] = eventNum;
+ row[1] = (byte) ((encodedTabletId >>> 24) & 0xff);
+ row[2] = (byte) ((encodedTabletId >>> 16) & 0xff);
+ row[3] = (byte) ((encodedTabletId >>> 8) & 0xff);
+ row[4] = (byte) (encodedTabletId & 0xff);
+ row[5] = (byte) (seq >>> 56);
+ row[6] = (byte) (seq >>> 48);
+ row[7] = (byte) (seq >>> 40);
+ row[8] = (byte) (seq >>> 32);
+ row[9] = (byte) (seq >>> 24);
+ row[10] = (byte) (seq >>> 16);
+ row[11] = (byte) (seq >>> 8);
+ row[12] = (byte) (seq); // >>> 0
+ return row;
+ }
+
+ /**
+ * Extract the tabletId integer from the byte encoded Row.
+ */
+ private static int getTabletId(byte[] row) {
+ int encoded = ((row[1] << 24) + (row[2] << 16) + (row[3] << 8) + row[4]);
Review comment:
You can get away with `+` in DataInputStream's implementation, because
it uses `int` the whole time. So, there's no integer type promotion from `byte`
to `int` going on before the `+`... they are already guaranteed to be unsigned
`int`s. That doesn't work if you are using `byte` types, which will be
automatically sign-extended when they are promoted to `int` for the addition
operation, thereby changing their value. This doesn't matter for the first byte
that gets shifted to the far left, because shifting all the way removes the
effects of sign extension. If we properly truncate any sign-extension from the
promotion to `int`, we can probably use `+`, but I think it's more intuitive to
use `|`, and I don't think it really matters, except to communicate to other
developers that we're doing bit-wise stuff, and not arithmetic stuff.
In any case, we've fixed this in #2176 and #2177, but I just wanted to
comment with the explanation for the archives.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]