[GitHub] [pinot] walterddr commented on a diff in pull request #9422: [multistage] [enhancement] Split transferable block when the size is too large

GitBox Tue, 27 Sep 2022 19:20:01 -0700


walterddr commented on code in PR #9422:
URL: https://github.com/apache/pinot/pull/9422#discussion_r981874998



##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/blocks/TransferableBlock.java:
##########
@@ -47,6 +48,49 @@ public class TransferableBlock implements Block {
   private BaseDataBlock _dataBlock;
   private List<Object[]> _container;
 
+  /**
+   *  Split a block into multiple block so that each block size is within 
maxBlockSize.
+   *  Currently, we only support split for row type dataBlock.
+   *  For columnar and metadata data block, we return the original data block.
+   *  For metadata data block, it has to be initialized through bytes instead 
of data block.
+   *
+   *  When row size is greater than maxBlockSize, we pack each row as a 
separate block.
+   */
+  private List<TransferableBlock> splitBlock(int maxBlockSize) {

Review Comment:
   Let's return `List<BaseDataBlock>` here
   



##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/blocks/TransferableBlock.java:
##########
@@ -47,6 +48,49 @@ public class TransferableBlock implements Block {
   private BaseDataBlock _dataBlock;
   private List<Object[]> _container;
 
+  /**
+   *  Split a block into multiple block so that each block size is within 
maxBlockSize.
+   *  Currently, we only support split for row type dataBlock.
+   *  For columnar and metadata data block, we return the original data block.
+   *  For metadata data block, it has to be initialized through bytes instead 
of data block.
+   *
+   *  When row size is greater than maxBlockSize, we pack each row as a 
separate block.
+   */
+  private List<TransferableBlock> splitBlock(int maxBlockSize) {
+    List<TransferableBlock> blockChunks = new ArrayList<>();
+    // TODO: Add support for all data block type.
+    if (getType() != BaseDataBlock.Type.ROW) {
+      blockChunks.add(this);
+      return blockChunks;
+    }
+    // TODO: Store row size in bytes inside data block.
+    DataSchema dataSchema = getDataBlock().getDataSchema();
+    int[] columnOffsets = new int[dataSchema.size()];
+    int rowSizeInBytes = DataBlockUtils.computeColumnOffsets(dataSchema, 
columnOffsets);
+    List<Object[]> chunk = new ArrayList<>();
+    long chunkSize = 0;
+    for (Object[] unit : getContainer()) {
+      if (rowSizeInBytes >= maxBlockSize) {

Review Comment:
   if you use `rowSizeInBytes` to split chunks, you know in advance how many 
rows to write b/c it is fixed. no need to check every row (unit)



##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/blocks/TransferableBlockUtils.java:
##########
@@ -27,7 +27,6 @@ public final class TransferableBlockUtils {
   private TransferableBlockUtils() {
     // do not instantiate.
   }
-

Review Comment:
   (nit) revert



##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/blocks/TransferableBlock.java:
##########
@@ -154,9 +199,34 @@ public boolean isErrorBlock() {
     return _isErrorBlock;
   }
 
-  public byte[] toBytes()
+  // If the data is passed in as container, only row and columnar data block 
type is supported.
+  public byte[] getDataBlockBytes()
       throws IOException {
-    return _dataBlock.toBytes();
+    return getDataBlock().toBytes();
+  }
+
+  /**
+   * Returns a list of bytes array split from data block.
+   * Each byte array size is smaller than maxBlockSize.
+   *
+   * Currently, we only support split for row type dataBlock.
+   * For columnar and metadata data block, we return the original data block.
+   * For metadata data block, it has to be initialized through bytes instead 
of data block.
+   *
+   * When row size is greater than maxBlockSize, we pack each row as a 
separate block.
+   *
+   * @param maxBlockSize
+   * @return
+   * @throws IOException
+   */
+  public List<byte[]> getChunkBytes(int maxBlockSize)

Review Comment:
   ```suggestion
     public List<BaseDataBlock> getDataBlockChunks(int maxBlockSize)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] walterddr commented on a diff in pull request #9422: [multistage] [enhancement] Split transferable block when the size is too large

Reply via email to