siddharthteotia commented on a change in pull request #5470:
URL: https://github.com/apache/incubator-pinot/pull/5470#discussion_r432898453



##########
File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -213,6 +215,14 @@ public void init(SegmentGeneratorConfig 
segmentCreationSpec, SegmentIndexCreatio
     }
   }
 
+  public static boolean shouldDeriveNumDocsPerChunk(String columnName, 
Map<String, Map<String, String>> columnProperties) {
+    if (columnProperties != null) {
+      Map<String, String> properties = columnProperties.get(columnName);
+      return properties != null && 
Boolean.parseBoolean(properties.get(FieldConfig.DERIVE_NUM_DOCS_PER_CHUNK_RAW_INDEX_KEY));

Review comment:
       > Defaults to false, right?
   
   Yes
   
   > Can we derive it automatically? (e.g. if column is text index then we 
derive it from metadata) Or, do you see this being usefiul in other cases as 
well?
   
   We could. Even for columns with text indexes, I don't think we should use it 
by default (since now that we have seen the potential -ve impact related to 
access pattern). Yes, most likely this will be used for columns with text index 
but only if the average column value size is very large (around 1-2MB) since 
that is the case which takes the chunk size and compressed chunk size (2 * raw) 
> 2GB and deriving the numDocsPerChunk becomes useful. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to