[GitHub] [pinot] Jackie-Jiang commented on a change in pull request #7629: implement `FixedByteChunkMVForwardIndexReader`

GitBox Mon, 25 Oct 2021 13:57:08 -0700


Jackie-Jiang commented on a change in pull request #7629:
URL: https://github.com/apache/pinot/pull/7629#discussion_r735961175




##########
File path: 
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueFixedByteRawIndexCreator.java
##########
@@ -71,33 +67,22 @@ public MultiValueFixedByteRawIndexCreator(File 
baseIndexDir, ChunkCompressionTyp
    * @param column Name of column to index
    * @param totalDocs Total number of documents to index
    * @param valueType Type of the values
-   * @param maxLengthOfEachEntry length of longest entry (in bytes)
    * @param deriveNumDocsPerChunk true if writer should auto-derive the number 
of rows per chunk
    * @param writerVersion writer format version
    */
   public MultiValueFixedByteRawIndexCreator(File baseIndexDir, 
ChunkCompressionType compressionType,
-      String column, int totalDocs, DataType valueType, final int 
maxLengthOfEachEntry,
-      final int maxNumberOfMultiValueElements, boolean deriveNumDocsPerChunk,
-      int writerVersion)
+      String column, int totalDocs, DataType valueType, int 
maxNumberOfMultiValueElements,
+      boolean deriveNumDocsPerChunk, int writerVersion)
       throws IOException {
-    File file = new File(baseIndexDir,
-        column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
-    FileUtils.deleteQuietly(file);
-    int totalMaxLength = maxNumberOfMultiValueElements * maxLengthOfEachEntry;
+    File file = new File(baseIndexDir, column + 
Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
+    int totalMaxLength = maxNumberOfMultiValueElements * 
valueType.getStoredType().size();
     int numDocsPerChunk =
-        deriveNumDocsPerChunk ? getNumDocsPerChunk(totalMaxLength) : 
DEFAULT_NUM_DOCS_PER_CHUNK;
+        deriveNumDocsPerChunk ? Math.max(TARGET_MAX_CHUNK_SIZE / 
totalMaxLength, 1) : DEFAULT_NUM_DOCS_PER_CHUNK;

Review comment:
       This needs to be adjust to include the entry header
   ```suggestion
           deriveNumDocsPerChunk ? Math.max(TARGET_MAX_CHUNK_SIZE / 
(totalMaxLength + 
VarByteChunkSVForwardIndexWriter.CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE), 1) : 
DEFAULT_NUM_DOCS_PER_CHUNK;
   ```

##########
File path: 
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java
##########
@@ -67,26 +67,23 @@ public MultiValueVarByteRawIndexCreator(File baseIndexDir, 
ChunkCompressionType
    * @param totalDocs Total number of documents to index
    * @param valueType Type of the values
    * @param maxRowLengthInBytes the size in bytes of the largest row, the 
chunk size cannot be smaller than this
+   * @param maxNumberOfElements the maximum number of elements in a row
    * @param writerVersion writer format version
    */
-  public MultiValueVarByteRawIndexCreator(File baseIndexDir, 
ChunkCompressionType compressionType,
-      String column, int totalDocs, DataType valueType, int writerVersion, int 
maxRowLengthInBytes)
+  public MultiValueVarByteRawIndexCreator(File baseIndexDir, 
ChunkCompressionType compressionType, String column,
+      int totalDocs, DataType valueType, int writerVersion, int 
maxRowLengthInBytes, int maxNumberOfElements)
       throws IOException {
     //we will prepend the actual content with numElements and length array 
containing length of each element
-    int totalMaxLength = Integer.BYTES + Math.max(maxRowLengthInBytes, 
TARGET_MAX_CHUNK_SIZE);
+    int maxLengthPrefixes = Integer.BYTES * maxNumberOfElements;
+    int totalMaxLength = Integer.BYTES + maxRowLengthInBytes + 
maxLengthPrefixes;
     File file = new File(baseIndexDir,
         column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
-    int numDocsPerChunk = getNumDocsPerChunk(totalMaxLength);
-    _indexWriter = new VarByteChunkSVForwardIndexWriter(file, compressionType, 
totalDocs,
-        numDocsPerChunk, totalMaxLength,
-        writerVersion);
-    _valueType = valueType;
-  }
-
-  private static int getNumDocsPerChunk(int lengthOfLongestEntry) {
     int overheadPerEntry =
-        lengthOfLongestEntry + 
VarByteChunkSVForwardIndexWriter.CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
-    return Math.max(TARGET_MAX_CHUNK_SIZE / overheadPerEntry, 1);
+        maxRowLengthInBytes + maxLengthPrefixes + 
VarByteChunkSVForwardIndexWriter.CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE;
+    int numDocsPerChunk = Math.max(TARGET_MAX_CHUNK_SIZE / overheadPerEntry, 
1);

Review comment:
       Similar here (note that currently `overheadPerEntry` is the same as 
`totalMaxLength`)
   ```suggestion
       int numDocsPerChunk = Math.max(TARGET_MAX_CHUNK_SIZE / (totalMaxLength + 
VarByteChunkSVForwardIndexWriter.CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE), 1);
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] Jackie-Jiang commented on a change in pull request #7629: implement `FixedByteChunkMVForwardIndexReader`

Reply via email to