Jackie-Jiang commented on code in PR #13994:
URL: https://github.com/apache/pinot/pull/13994#discussion_r1777816900
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/dictionary/DictionaryIndexType.java:
##########
@@ -236,10 +237,18 @@ public static boolean ignoreDictionaryOverride(boolean
optimizeDictionary,
// Do not create dictionary if index size with dictionary is going to be
larger than index size without dictionary
// This is done to reduce the cost of dictionary for high cardinality
columns
// Off by default and needs optimizeDictionary to be set to true
- if (fieldSpec.isSingleValueField() &&
fieldSpec.getDataType().isFixedWidth()) {
- // if you can safely enable dictionary, you can ignore overrides
- return canSafelyCreateDictionaryWithinThreshold(cardinality,
totalNumberOfEntries,
- noDictionarySizeRatioThreshold, fieldSpec);
+ if (fieldSpec.isSingleValueField()) {
+ if (fieldSpec.getDataType().isFixedWidth()) {
+ // if you can safely enable dictionary, you can ignore overrides
+ return canSafelyCreateDictionaryWithinThreshold(cardinality,
totalNumberOfEntries,
+ noDictionarySizeRatioThreshold, fieldSpec);
+ }
+ // Config not set, default to old behavior of create dictionary for
var width cols
+ if (noDictionaryCardinalityRatioThreshold == null) {
+ return true;
+ }
+ // variable width type, so create based simply on cardinality
threshold size cannot be calculated easily
+ return noDictionaryCardinalityRatioThreshold * totalNumberOfEntries >
cardinality;
Review Comment:
Currently this is not applied to metric when `optimizeDictionaryForMetrics`
is true. I think you might also want it for metric as it is more common to have
high cardinality?
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -293,8 +293,9 @@ private boolean
createDictionaryForColumn(ColumnIndexCreationInfo info, SegmentG
FieldIndexConfigs fieldIndexConfigs =
config.getIndexConfigsByColName().get(column);
if
(DictionaryIndexType.ignoreDictionaryOverride(config.isOptimizeDictionary(),
- config.isOptimizeDictionaryForMetrics(),
config.getNoDictionarySizeRatioThreshold(), spec, fieldIndexConfigs,
- info.getDistinctValueCount(), info.getTotalNumberOfEntries())) {
+ config.isOptimizeDictionaryForMetrics(),
config.getNoDictionarySizeRatioThreshold(),
Review Comment:
Having 4 configs on the same topic passed in separately is a little bit
ugly, but we can address it separately
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/dictionary/DictionaryIndexType.java:
##########
@@ -236,10 +237,18 @@ public static boolean ignoreDictionaryOverride(boolean
optimizeDictionary,
// Do not create dictionary if index size with dictionary is going to be
larger than index size without dictionary
// This is done to reduce the cost of dictionary for high cardinality
columns
// Off by default and needs optimizeDictionary to be set to true
- if (fieldSpec.isSingleValueField() &&
fieldSpec.getDataType().isFixedWidth()) {
- // if you can safely enable dictionary, you can ignore overrides
- return canSafelyCreateDictionaryWithinThreshold(cardinality,
totalNumberOfEntries,
- noDictionarySizeRatioThreshold, fieldSpec);
+ if (fieldSpec.isSingleValueField()) {
+ if (fieldSpec.getDataType().isFixedWidth()) {
+ // if you can safely enable dictionary, you can ignore overrides
+ return canSafelyCreateDictionaryWithinThreshold(cardinality,
totalNumberOfEntries,
Review Comment:
For fixed width type `noDictionarySizeRatioThreshold` is always picked even
if it is not configured (default value being used). Do we want to provide a way
so that `noDictionaryCardinalityRatioThreshold` is picked when
`noDictionarySizeRatioThreshold` is not configured or explicitly disabled?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]