suneet-s commented on pull request #10316:
URL: https://github.com/apache/druid/pull/10316#issuecomment-682019819


   > > ```
   > > diff --git 
a/processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
 
b/processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
   > > index 67b1ee0b76..7da31f3811 100644
   > > --- 
a/processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
   > > +++ 
b/processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java
   > > @@ -435,6 +435,11 @@ public class JoinFilterAnalyzer
   > >            );
   > >          }
   > >  
   > > +        // Wrap filter values in immutable set so that classes 
consuming this set do not compute the hash code
   > > +        // again and again. We are creating multiple InDimFilter 
filters down for the same set of filter values.
   > > +        // Calculating hashCode will be less expensive for these 
InDimFilter as hashCode for filter values is pre-computed
   > > +        newFilterValues = ImmutableSet.copyOf(newFilterValues);
   > > +
   > >          for (String correlatedBaseColumn : 
correlationAnalysis.getBaseColumns()) {
   > >            Filter rewrittenFilter = new InDimFilter(
   > >                correlatedBaseColumn,
   > > ```
   > > 
   > > 
   > > This is what I had in mind.
   > 
   > I initially didn't want to use an ImmutableSet because 
`ImmutableSet.copyOf(...)` would have to traverse through the entire set, and 
there may be cases where an InDimFilter doesn't need to traverse the entire 
set. However, this comment made me think that maybe the best way to do this is 
to have the correlatedValuesMap use an `ImmutableSet.Builder` instead. This way 
the hashCode is memoized as the set is being constructed! I might leave the 
other code paths as is so that this change only really impacts join clauses. 
What do you think @abhishekagarwal87 ?
   
   I changed my mind... decided it's better to spend a little more time 
thinking about this
   
   IndexedTableJoinable needs to know the number of uniques while constructing 
the Set so that we can limit the number of values that can be pushed down. The 
ImmutableSetBuilder doesn't know the number of uniques till the time the Set is 
constructed
   ```
   IntList rowIndex = index.find(searchColumnValue);
           for (int i = 0; i < rowIndex.size(); i++) {
             int rowNum = rowIndex.getInt(i);
             String correlatedDimVal = Objects.toString(reader.read(rowNum), 
null);
             correlatedValuesBuilder.add(correlatedDimVal);
   
             if (correlatedValuesBuilder.size() > maxCorrelationSetSize) {
               return Optional.empty();
             }
           }
           return Optional.of(correlatedValuesBuilder.build());
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to