kgyrtkirk commented on code in PR #16517:
URL: https://github.com/apache/druid/pull/16517#discussion_r1627535702
##########
processing/src/main/java/org/apache/druid/segment/StringDimensionIndexer.java:
##########
@@ -132,12 +142,14 @@ public EncodedKeyComponent<int[]>
processRowValsToUnsortedEncodedKeyComponent(@N
} else if (dimValues instanceof byte[]) {
encodedDimensionValues =
new
int[]{dimLookup.add(emptyToNullIfNeeded(StringUtils.encodeBase64String((byte[])
dimValues)))};
+ dictionaryChanged = true;
} else {
encodedDimensionValues = new
int[]{dimLookup.add(emptyToNullIfNeeded(dimValues))};
+ dictionaryChanged = true;
}
// If dictionary size has changed, the sorted lookup is no longer valid.
- if (oldDictSize != dimLookup.size()) {
+ if (dictionaryChanged) {
Review Comment:
I believe the main purpose of the `dimLookup` field is to provide a common
place where the different strings get an assigned number - so that if its able
to recall an earlier value that's actually very important ...this change will
destroy the `sortedLookup` cache even in case all calls to `add` were hits.
I think you should either remove the changes in this file - or find a
different way to address this; one idea could be: to connect the `sortedLookup`
more closely when an add happens:
* move the sorted dictionary caching to `StringDimensionDictionary`
* extract a protected `realAdd` from the current `add` in
`DimensionDictionary`
* implement `realAdd` to clear the cached `sorted` stuff in
`StringDimensionDictionary`
again; let me invite @clintropolis to see if He have a different idea :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]