Re: [PR] Optimise indexer resource usage based on CPU profile (druid)

via GitHub Wed, 05 Jun 2024 04:37:48 -0700


kgyrtkirk commented on code in PR #16517:
URL: https://github.com/apache/druid/pull/16517#discussion_r1627535702



##########
processing/src/main/java/org/apache/druid/segment/StringDimensionIndexer.java:
##########
@@ -132,12 +142,14 @@ public EncodedKeyComponent<int[]> 
processRowValsToUnsortedEncodedKeyComponent(@N
     } else if (dimValues instanceof byte[]) {
       encodedDimensionValues =
           new 
int[]{dimLookup.add(emptyToNullIfNeeded(StringUtils.encodeBase64String((byte[]) 
dimValues)))};
+      dictionaryChanged = true;
     } else {
       encodedDimensionValues = new 
int[]{dimLookup.add(emptyToNullIfNeeded(dimValues))};
+      dictionaryChanged = true;
     }
 
     // If dictionary size has changed, the sorted lookup is no longer valid.
-    if (oldDictSize != dimLookup.size()) {
+    if (dictionaryChanged) {

Review Comment:
   I believe the main purpose of the `dimLookup` field is to provide a common 
place where the different strings get an assigned number - so that if its able 
to recall an earlier value that's actually very important ...this change will 
destroy the `sortedLookup` cache even in case all calls to `add` were hits.
   
   I think you should either remove the changes in this file - or find a 
different way to address this; one idea could be: to connect the `sortedLookup` 
more closely when an add happens:
   * move the sorted dictionary caching to `StringDimensionDictionary`
   * extract a protected `realAdd` from the current `add` in 
`DimensionDictionary` 
   * implement `realAdd` to clear the cached `sorted` stuff in 
`StringDimensionDictionary`
   
   again; let me invite @clintropolis to see if He have a different idea :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Optimise indexer resource usage based on CPU profile (druid)

Reply via email to