KKcorps commented on code in PR #9527:
URL: https://github.com/apache/pinot/pull/9527#discussion_r1008217597
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -319,11 +319,16 @@ private boolean
createDictionaryForColumn(ColumnIndexCreationInfo info, SegmentG
return false;
}
+ // Do not create dictionaries for json or text index columns as they are
high-cardinality values almost always
+ if (config.getJsonIndexCreationColumns().contains(column)
+ || config.getTextIndexCreationColumns().contains(column)) {
+ return false;
+ }
+
// Do not create dictionary if index size with dictionary is going to be
larger than index size without dictionary
// This is done to reduce the cost of dictionary for high cardinality
columns
// Off by default and needs optimizeDictionaryEnabled to be set to true
- if (config.isOptimizeDictionaryForMetrics() && spec.getFieldType() ==
FieldType.METRIC && spec.isSingleValueField()
- && spec.getDataType().isFixedWidth()) {
+ if (config.isOptimizeDictionaryForMetrics() && spec.isSingleValueField()
&& spec.getDataType().isFixedWidth()) {
Review Comment:
On thinking about it, It will get pretty confusing for users. I have decided
to finally change the config to `optimizeDictionary` and keep the old config as
well for backward compatibility. Have added comments for deprecation though.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]