gianm commented on a change in pull request #11068:
URL: https://github.com/apache/druid/pull/11068#discussion_r608080628
##########
File path:
processing/src/main/java/org/apache/druid/segment/join/JoinableFactoryWrapper.java
##########
@@ -84,22 +100,48 @@ public JoinableFactoryWrapper(final JoinableFactory
joinableFactory)
return Function.identity();
} else {
final JoinableClauses joinableClauses =
JoinableClauses.createClauses(clauses, joinableFactory);
+ final JoinFilterRewriteConfig filterRewriteConfig =
JoinFilterRewriteConfig.forQuery(query);
+
+ // Pick off any join clauses that can be converted into filters.
+ final Set<String> requiredColumns = query.getRequiredColumns();
+ final Filter baseFilterToUse;
+ final List<JoinableClause> clausesToUse;
+
+ if (requiredColumns != null &&
filterRewriteConfig.isEnableRewriteJoinToFilter()) {
+ final Pair<List<Filter>, List<JoinableClause>> conversionResult
= convertJoinsToFilters(
+ joinableClauses.getJoinableClauses(),
+ requiredColumns,
+
Ints.checkedCast(Math.min(filterRewriteConfig.getFilterRewriteMaxSize(),
Integer.MAX_VALUE))
Review comment:
I was thinking we should rely on the user setting this parameter
"correctly" sort of like the subquery limit. I also think most people won't
change it from the default, which is 10,000 and should be pretty safe. Unless
the values are _gigantic_ it's only going to be a few MB per query.
I thought a bit about measuring these limits in terms of bytes instead of
rows, which has pros/cons:
- Pro of bytes: less likely to be misconfigured & cause OOME, more likely to
use memory efficiently & maximally
- Con of bytes: harder for users to understand the limit. "10,000 rows" is
easy to communicate & understand; "5MB" is harder because people won't be able
to easily figure out if a particular data set fits in 5MB or not.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]