gianm commented on code in PR #14237:
URL: https://github.com/apache/druid/pull/14237#discussion_r1190478513
##########
processing/src/main/java/org/apache/druid/segment/join/table/IndexedTableJoinable.java:
##########
@@ -103,7 +103,8 @@ public ColumnValuesWithUniqueFlag
getNonNullColumnValues(String columnName, fina
}
try (final IndexedTable.Reader reader =
table.columnReader(columnPosition)) {
- // Sorted set to encourage "in" filters that result from this method to
do dictionary lookups in order.
+ // Use a SortedSet so InDimFilter doesn't need to create its own
+ // and to encourage "in" filters that result from this method to do
dictionary lookups in order.
Review Comment:
The "encourage 'in' filters that result from this method to do dictionary
lookups in order" piece isn't accurate any longer, because they no longer need
encouragement. (They always use sorted sets internally.) So this is more about
an optimization in creating the "in" filter itself.
##########
processing/src/main/java/org/apache/druid/segment/join/JoinableFactoryWrapper.java:
##########
@@ -144,9 +144,15 @@ static JoinClauseToFilterConversion convertJoinToFilter(
final Set<String> rightPrefixes
)
{
+ // This optimization kicks in when there is exactly 1 equijoin
+ // The reason being that getNonNullColumnValues uses a TreeSet for
handling duplicates
+ // and does not return values in order.
+ // In case TreeSet if replaced by a LinkedHashSet, duplicates are not
handled correctly
+ // and resulting ordering is improper.
+ // Considering these joins are converted to filters only when there is 1
equijoin
if (clause.getJoinType() == JoinType.INNER
&& clause.getCondition().getNonEquiConditions().isEmpty()
- && clause.getCondition().getEquiConditions().size() > 0) {
+ && clause.getCondition().getEquiConditions().size() == 1) {
final List<Filter> filters = new ArrayList<>();
Review Comment:
No need for `filters` anymore, as there is always going to be exactly one.
You can use `CollectionUtils.getOnlyElement` to get the one condition and
process it directly. It will simplify the later logic.
##########
processing/src/main/java/org/apache/druid/segment/join/JoinableFactoryWrapper.java:
##########
@@ -144,9 +144,15 @@ static JoinClauseToFilterConversion convertJoinToFilter(
final Set<String> rightPrefixes
)
{
+ // This optimization kicks in when there is exactly 1 equijoin
+ // The reason being that getNonNullColumnValues uses a TreeSet for
handling duplicates
+ // and does not return values in order.
+ // In case TreeSet if replaced by a LinkedHashSet, duplicates are not
handled correctly
Review Comment:
This comment doesn't really make sense to someone that hasn't seen previous
versions of this PR. Consider removing or rewording it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]