suneet-s commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r420360898
##########
File path:
processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -132,31 +145,40 @@ public FilterTuning getFilterTuning()
@Override
public byte[] getCacheKey()
{
- boolean hasNull = false;
- for (String value : values) {
- if (value == null) {
- hasNull = true;
- break;
+ if (cacheKey == null) {
+ final List<String> sortedValues = new ArrayList<>(values);
+ sortedValues.sort(Comparator.nullsFirst(Ordering.natural()));
+ final Hasher hasher = Hashing.sha256().newHasher();
Review comment:
> I chose sha256 since it is long enough to distinguish different
`values` sets even though I haven't really tested. I will do some tests.
👍
I think a simpler solution would be to revert back to the previous way of
building a cacheKey, but use `.appendStringsIgnoringOrder()` instead of
`.appendStrings()` This way the sort is done once for the list and we don't add
any new memory overhead from the current implementation.
Maybe consider creating a new utility function to the CacheKeyBuilder that
sorts the list and writes whether or not any nulls existed in the list. This
way we can do one pass of the list to build the cacheKey, but even at 10k
entries in the list, this might be a micro optimization.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]