gianm commented on code in PR #14237:
URL: https://github.com/apache/druid/pull/14237#discussion_r1190478513


##########
processing/src/main/java/org/apache/druid/segment/join/table/IndexedTableJoinable.java:
##########
@@ -103,7 +103,8 @@ public ColumnValuesWithUniqueFlag 
getNonNullColumnValues(String columnName, fina
     }
 
     try (final IndexedTable.Reader reader = 
table.columnReader(columnPosition)) {
-      // Sorted set to encourage "in" filters that result from this method to 
do dictionary lookups in order.
+      // Use a SortedSet so InDimFilter doesn't need to create its own
+      // and to encourage "in" filters that result from this method to do 
dictionary lookups in order.

Review Comment:
   The "encourage 'in' filters that result from this method to do dictionary 
lookups in order" piece isn't accurate any longer, because they no longer need 
encouragement. (They always use sorted sets internally.) So this is more about 
an optimization in creating the "in" filter itself.



##########
processing/src/main/java/org/apache/druid/segment/join/JoinableFactoryWrapper.java:
##########
@@ -144,9 +144,15 @@ static JoinClauseToFilterConversion convertJoinToFilter(
       final Set<String> rightPrefixes
   )
   {
+    // This optimization kicks in when there is exactly 1 equijoin
+    // The reason being that getNonNullColumnValues uses a TreeSet for 
handling duplicates
+    // and does not return values in order.
+    // In case TreeSet if replaced by a LinkedHashSet, duplicates are not 
handled correctly
+    // and resulting ordering is improper.
+    // Considering these joins are converted to filters only when there is 1 
equijoin
     if (clause.getJoinType() == JoinType.INNER
         && clause.getCondition().getNonEquiConditions().isEmpty()
-        && clause.getCondition().getEquiConditions().size() > 0) {
+        && clause.getCondition().getEquiConditions().size() == 1) {
       final List<Filter> filters = new ArrayList<>();

Review Comment:
   No need for `filters` anymore, as there is always going to be exactly one. 
You can use `CollectionUtils.getOnlyElement` to get the one condition and 
process it directly. It will simplify the later logic.



##########
processing/src/main/java/org/apache/druid/segment/join/JoinableFactoryWrapper.java:
##########
@@ -144,9 +144,15 @@ static JoinClauseToFilterConversion convertJoinToFilter(
       final Set<String> rightPrefixes
   )
   {
+    // This optimization kicks in when there is exactly 1 equijoin
+    // The reason being that getNonNullColumnValues uses a TreeSet for 
handling duplicates
+    // and does not return values in order.
+    // In case TreeSet if replaced by a LinkedHashSet, duplicates are not 
handled correctly

Review Comment:
   This comment doesn't really make sense to someone that hasn't seen previous 
versions of this PR. Consider removing or rewording it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to