Jackie-Jiang commented on code in PR #13303:
URL: https://github.com/apache/pinot/pull/13303#discussion_r1722533948
##########
pinot-common/src/main/java/org/apache/pinot/common/datablock/BaseDataBlock.java:
##########
@@ -426,168 +401,202 @@ public Map<Integer, String> getExceptions() {
return _errCodeToExceptionMap;
}
- /**
- * Serialize this data block to a byte array.
- * <p>
- * In order to deserialize it, {@link
DataBlockUtils#getDataBlock(ByteBuffer)} should be used.
- */
@Override
- public byte[] toBytes()
+ public List<ByteBuffer> serialize()
throws IOException {
- ThreadResourceUsageProvider threadResourceUsageProvider = new
ThreadResourceUsageProvider();
-
- UnsynchronizedByteArrayOutputStream byteArrayOutputStream = new
UnsynchronizedByteArrayOutputStream(8192);
- DataOutputStream dataOutputStream = new
DataOutputStream(byteArrayOutputStream);
- writeLeadingSections(dataOutputStream);
-
- // Write metadata: length followed by actual metadata bytes.
- // NOTE: We ignore metadata serialization time in
"responseSerializationCpuTimeNs" as it's negligible while
- // considering it will bring a lot code complexity.
- serializeMetadata(dataOutputStream);
-
- return byteArrayOutputStream.toByteArray();
- }
-
- private void writeLeadingSections(DataOutputStream dataOutputStream)
- throws IOException {
- dataOutputStream.writeInt(getDataBlockVersionType());
- dataOutputStream.writeInt(_numRows);
- dataOutputStream.writeInt(_numColumns);
- int dataOffset = HEADER_SIZE;
-
- // Write exceptions section offset(START|SIZE).
- dataOutputStream.writeInt(dataOffset);
- byte[] exceptionsBytes;
- exceptionsBytes = serializeExceptions();
- dataOutputStream.writeInt(exceptionsBytes.length);
- dataOffset += exceptionsBytes.length;
-
- // Write dictionary map section offset(START|SIZE).
- dataOutputStream.writeInt(dataOffset);
- byte[] dictionaryBytes = null;
- if (_stringDictionary != null) {
- dictionaryBytes = serializeStringDictionary();
- dataOutputStream.writeInt(dictionaryBytes.length);
- dataOffset += dictionaryBytes.length;
- } else {
- dataOutputStream.writeInt(0);
- }
-
- // Write data schema section offset(START|SIZE).
- dataOutputStream.writeInt(dataOffset);
- byte[] dataSchemaBytes = null;
- if (_dataSchema != null) {
- dataSchemaBytes = _dataSchema.toBytes();
- dataOutputStream.writeInt(dataSchemaBytes.length);
- dataOffset += dataSchemaBytes.length;
- } else {
- dataOutputStream.writeInt(0);
+ if (_serialized == null) {
+ _serialized = DataBlockUtils.serialize(this);
Review Comment:
Thanks for the explanation. Now I understand why the version is moved into
the ser-de class: basically we want to shift the wire format control into the
`DataBlockSerDe`, and `DataBlock` only defines the in-memory format right?
Since you are removing the version from `DataBlock`, and currently we only
allow a single version of `DataBlock`, should we consider also cleaning up the
backward compatible `DataBlock`s in the test since this no longer apply? In the
`DataBlockSerDe`, we can make a clean start from version 0 or 1 to reduce
confusion for future developers.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]