Jackie-Jiang commented on a change in pull request #5256: Derive num docs per 
chunk from max column value length for varbyte raw index creator
URL: https://github.com/apache/incubator-pinot/pull/5256#discussion_r409785519
 
 

 ##########
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/fwd/SingleValueVarByteRawIndexCreator.java
 ##########
 @@ -27,15 +28,21 @@
 
 
 public class SingleValueVarByteRawIndexCreator extends 
BaseSingleValueRawIndexCreator {
-  private static final int NUM_DOCS_PER_CHUNK = 1000; // TODO: Auto-derive 
this based on metadata.
+  private static final int TARGET_MAX_CHUNK_SIZE = 1024*1024;
 
   private final VarByteChunkSingleValueWriter _indexWriter;
 
   public SingleValueVarByteRawIndexCreator(File baseIndexDir, 
ChunkCompressorFactory.CompressionType compressionType,
       String column, int totalDocs, int maxLength)
       throws IOException {
     File file = new File(baseIndexDir, column + 
V1Constants.Indexes.RAW_SV_FORWARD_INDEX_FILE_EXTENSION);
-    _indexWriter = new VarByteChunkSingleValueWriter(file, compressionType, 
totalDocs, NUM_DOCS_PER_CHUNK, maxLength);
+    _indexWriter = new VarByteChunkSingleValueWriter(file, compressionType, 
totalDocs, getNumDocsPerChunk(maxLength), maxLength);
+  }
+
+  @VisibleForTesting
+  public static int getNumDocsPerChunk(int lengthOfLongestEntry) {
 
 Review comment:
   This logic can be pushed down to the VarByteChunkSingleValueWriter?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to