[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

GitBox Sun, 24 Oct 2021 02:23:22 -0700


richardstartin commented on a change in pull request #7595:
URL: https://github.com/apache/pinot/pull/7595#discussion_r735089959




##########
File path: 
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkSVForwardIndexWriter.java
##########
@@ -96,25 +99,66 @@ public void putBytes(byte[] value) {
     _chunkBuffer.put(value);
     _chunkDataOffSet += value.length;
 
-    // If buffer filled, then compress and write to file.
-    if (_chunkHeaderOffset == _chunkHeaderSize) {
-      writeChunk();
+    writeChunkIfNecessary();
+  }
+
+  // Note: some duplication is tolerated between these overloads for the sake 
of memory efficiency
+
+  public void putStrings(String[] values) {
+    // the entire String[] will be encoded as a single string, write the 
header here
+    _chunkBuffer.putInt(_chunkHeaderOffset, _chunkDataOffSet);
+    _chunkHeaderOffset += CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+    // write all the strings into the data buffer as if it's a single string,
+    // but with its own embedded header so offsets to strings within the body
+    // can be located
+    int headerPosition = _chunkDataOffSet;
+    int headerSize = Integer.BYTES + Integer.BYTES * values.length;
+    int bodyPosition = headerPosition + headerSize;
+    _chunkBuffer.position(bodyPosition);
+    int bodySize = 0;
+    for (int i = 0, h = headerPosition + Integer.BYTES; i < values.length; 
i++, h += Integer.BYTES) {
+      byte[] utf8 = values[i].getBytes(UTF_8);
+      _chunkBuffer.putInt(h, utf8.length);
+      _chunkBuffer.put(utf8);
+      bodySize += utf8.length;
     }
+    _chunkDataOffSet += headerSize + bodySize;
+    // go back to write the number of strings embedded in the big string
+    _chunkBuffer.putInt(headerPosition, values.length);
+
+    writeChunkIfNecessary();
   }
 
-  @Override
-  public void close()
-      throws IOException {
+  public void putByteArrays(byte[][] values) {
+    // the entire byte[][] will be encoded as a single string, write the 
header here
+    _chunkBuffer.putInt(_chunkHeaderOffset, _chunkDataOffSet);
+    _chunkHeaderOffset += CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+    // write all the byte[]s into the data buffer as if it's a single byte[],
+    // but with its own embedded header so offsets to byte[]s within the body
+    // can be located
+    int headerPosition = _chunkDataOffSet;
+    int headerSize = Integer.BYTES + Integer.BYTES * values.length;
+    int bodyPosition = headerPosition + headerSize;
+    _chunkBuffer.position(bodyPosition);
+    int bodySize = 0;
+    for (int i = 0, h = headerPosition + Integer.BYTES; i < values.length; 
i++, h += Integer.BYTES) {
+      byte[] utf8 = values[i];

Review comment:
       Why?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] richardstartin commented on a change in pull request #7595: Add MV raw forward index and MV `BYTES` data type

Reply via email to