siddharthteotia opened a new pull request #5470:
URL: https://github.com/apache/incubator-pinot/pull/5470


   (1) PR https://github.com/apache/incubator-pinot/pull/5256 added support for 
deriving num docs per chunk for var byte raw index create from column length. 
This was specifically
   done as part of supporting large text values. For use cases that don't want 
this feature and are high QPS, they see a negative impact since size of chunk 
increases (earlier value
   of numDocsPerChunk was hardcoded to 1000) and based on the access pattern we 
might end up uncompressing a bigger chunk to get values for a set of docIds. We 
have made this change configurable. So the default behavior is same as old 
(1000 docs per chunk. It can be enabled as follows
   
   `fieldConfigList":[
      {
        "name":"textCol",
        "encodingType":"RAW",
        "indexType":"TEXT",
        "properties":{
           "derive.num.chunks.raw.index":"true",
         }
       }
   `
   
   (2) PR https://github.com/apache/incubator-pinot/pull/4791 added support for 
noDict for STRING/BYTES in consuming segments. Before PR 4791, even if user had 
STRING/BYTES as no dictionary in table config, consuming segment still created 
dictionary because of the lack of support for raw index.  There is a particular 
impact of this change on the use cases that have set noDict on their STRING 
dimension columns for other performance reasons and also want 
metricsAggregation. These use cases don't get to aggregateMetrics because the 
new implementation was able to honor their table config setting of noDict on 
STRING/BYTES and created a raw index. Without metrics aggregation, memory 
pressure increases. So to continue aggregating metrics for such cases, we will 
create dictionary for STRING/BYTES even if the column is part of noDictionary 
set from table config.
   
   ## Description
   Add a description of your PR here.
   A good description should include pointers to an issue or design document, 
etc.
   ## Upgrade Notes
   Does this PR prevent a zero down-time upgrade? (Assume upgrade order: 
Controller, Broker, Server, Minion)
   * [ ] Yes (Please label as **<code>backward-incompat</code>**, and complete 
the section below on Release Notes)
   
   Does this PR fix a zero-downtime upgrade introduced earlier?
   * [ ] Yes (Please label this as **<code>backward-incompat</code>**, and 
complete the section below on Release Notes)
   
   Does this PR otherwise need attention when creating release notes? Things to 
consider:
   - New configuration options
   - Deprecation of configurations
   - Signature changes to public methods/interfaces
   - New plugins added or old plugins removed
   * [ ] Yes (Please label this PR as **<code>release-notes</code>** and 
complete the section on Release Notes)
   ## Release Notes
   If you have tagged this as either backward-incompat or release-notes,
   you MUST add text here that you would like to see appear in release notes of 
the
   next release.
   
   If you have a series of commits adding or enabling a feature, then
   add this section only in final commit that marks the feature completed.
   Refer to earlier release notes to see examples of text
   
   ## Documentation
   If you have introduced a new feature or configuration, please add it to the 
documentation as well.
   See 
https://docs.pinot.apache.org/developers/developers-and-contributors/update-document
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to