Jackie-Jiang commented on code in PR #13303:
URL: https://github.com/apache/pinot/pull/13303#discussion_r1674702636
##########
pinot-common/src/main/java/org/apache/pinot/common/datablock/BaseDataBlock.java:
##########
@@ -426,168 +401,202 @@ public Map<Integer, String> getExceptions() {
return _errCodeToExceptionMap;
}
- /**
- * Serialize this data block to a byte array.
- * <p>
- * In order to deserialize it, {@link
DataBlockUtils#getDataBlock(ByteBuffer)} should be used.
- */
@Override
- public byte[] toBytes()
+ public List<ByteBuffer> serialize()
throws IOException {
- ThreadResourceUsageProvider threadResourceUsageProvider = new
ThreadResourceUsageProvider();
-
- UnsynchronizedByteArrayOutputStream byteArrayOutputStream = new
UnsynchronizedByteArrayOutputStream(8192);
- DataOutputStream dataOutputStream = new
DataOutputStream(byteArrayOutputStream);
- writeLeadingSections(dataOutputStream);
-
- // Write metadata: length followed by actual metadata bytes.
- // NOTE: We ignore metadata serialization time in
"responseSerializationCpuTimeNs" as it's negligible while
- // considering it will bring a lot code complexity.
- serializeMetadata(dataOutputStream);
-
- return byteArrayOutputStream.toByteArray();
- }
-
- private void writeLeadingSections(DataOutputStream dataOutputStream)
- throws IOException {
- dataOutputStream.writeInt(getDataBlockVersionType());
- dataOutputStream.writeInt(_numRows);
- dataOutputStream.writeInt(_numColumns);
- int dataOffset = HEADER_SIZE;
-
- // Write exceptions section offset(START|SIZE).
- dataOutputStream.writeInt(dataOffset);
- byte[] exceptionsBytes;
- exceptionsBytes = serializeExceptions();
- dataOutputStream.writeInt(exceptionsBytes.length);
- dataOffset += exceptionsBytes.length;
-
- // Write dictionary map section offset(START|SIZE).
- dataOutputStream.writeInt(dataOffset);
- byte[] dictionaryBytes = null;
- if (_stringDictionary != null) {
- dictionaryBytes = serializeStringDictionary();
- dataOutputStream.writeInt(dictionaryBytes.length);
- dataOffset += dictionaryBytes.length;
- } else {
- dataOutputStream.writeInt(0);
- }
-
- // Write data schema section offset(START|SIZE).
- dataOutputStream.writeInt(dataOffset);
- byte[] dataSchemaBytes = null;
- if (_dataSchema != null) {
- dataSchemaBytes = _dataSchema.toBytes();
- dataOutputStream.writeInt(dataSchemaBytes.length);
- dataOffset += dataSchemaBytes.length;
- } else {
- dataOutputStream.writeInt(0);
+ if (_serialized == null) {
+ _serialized = DataBlockUtils.serialize(this);
Review Comment:
By version control, I meant the version of `DataBlock`.
Seems like the idea in this PR is to decouple the `DataBlock` version and
`DataBlockSerDe` version, which is counter intuitive to me. Even if we want to
maintain multiple ser-de versions so that we can try out new implementations,
they should be under the same `DataBlock` version, where the serialized bytes
remains the same. Say if we introduce a new `DataBlock` version with different
bytes representation on the wire, I don't see how we can reuse the same
`DataBlockSerDe` for it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]