Re: [PR] fix filtering bug in filtering unnest cols and dim cols: Received a non-applicable rewrite (druid)

via GitHub Thu, 03 Aug 2023 13:32:37 -0700


clintropolis commented on code in PR #14587:
URL: https://github.com/apache/druid/pull/14587#discussion_r1283688498



##########
processing/src/main/java/org/apache/druid/segment/filter/Filters.java:
##########
@@ -422,4 +423,19 @@ private static LinkedHashSet<Filter> 
flattenOrChildren(final Collection<Filter>
 
     return retVal;
   }
+
+  public static int countNumberOfFilters(Filter filter)

Review Comment:
   nit: if this method needs to stay (re if other comment suggesting pushing 
counting inside of filter splitter does not work out for some reason) then this 
argument should be marked `@Nullable`



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -310,84 +311,31 @@ to generate filters to be passed to base cursor 
(filtersPushedDownToBaseCursor)
        filtersPushedDownToBaseCursor -> null (as the filter cannot be 
re-written due to presence of virtual columns)
        filtersForPostUnnestCursor -> d12 IN (a,b) or m1 < 10
      */
-    class FilterSplitter
-    {
-      final List<Filter> filtersPushedDownToBaseCursor = new ArrayList<>();
-      final List<Filter> filtersForPostUnnestCursor = new ArrayList<>();
-
-      void addPostFilterWithPreFilterIfRewritePossible(@Nullable final Filter 
filter, boolean skipPreFilters)
-      {
-        if (filter == null) {
-          return;
-        }
-        if (!skipPreFilters) {
-          final Filter newFilter = 
rewriteFilterOnUnnestColumnIfPossible(filter, inputColumn, 
inputColumnCapabilites);
-          if (newFilter != null) {
-            // Add the rewritten filter pre-unnest, so we get the benefit of 
any indexes, and so we avoid unnesting
-            // any rows that do not match this filter at all.
-            filtersPushedDownToBaseCursor.add(newFilter);
-          }
-        }
-        // Add original filter post-unnest no matter what: we need to filter 
out any extraneous unnested values.
-        filtersForPostUnnestCursor.add(filter);
-      }
-
-      void addPreFilter(@Nullable final Filter filter)
-      {
-        if (filter == null) {
-          return;
-        }
-
-        final Set<String> requiredColumns = filter.getRequiredColumns();
-
-        // Run filter post-unnest if it refers to any virtual columns. This is 
a conservative judgement call
-        // that perhaps forces the code to use a ValueMatcher where an index 
would've been available,
-        // which can have real performance implications. This is an interim 
choice made to value correctness
-        // over performance. When we need to optimize this performance, we 
should be able to
-        // create a VirtualColumnDatasource that contains all the virtual 
columns, in which case the query
-        // itself would stop carrying them and everything should be able to be 
pushed down.
-        if (queryVirtualColumns.getVirtualColumns().length > 0) {
-          for (String column : requiredColumns) {
-            if (queryVirtualColumns.exists(column)) {
-              filtersForPostUnnestCursor.add(filter);
-              return;
-            }
-          }
-        }
-        filtersPushedDownToBaseCursor.add(filter);
-
-      }
-    }
-
-    final FilterSplitter filterSplitter = new FilterSplitter();
+    final FilterSplitter filterSplitter = new FilterSplitter(inputColumn, 
inputColumnCapabilites, queryVirtualColumns);
 
     if (queryFilter != null) {
-      List<Filter> preFilterList = new ArrayList<>();
-      final int origFilterSize;
       if (queryFilter.getRequiredColumns().contains(outputColumnName)) {
         // outside filter contains unnested column
-        // requires check for OR
-        if (queryFilter instanceof OrFilter) {
-          origFilterSize = ((OrFilter) queryFilter).getFilters().size();
-          for (Filter filter : ((OrFilter) queryFilter).getFilters()) {
-            if (filter.getRequiredColumns().contains(outputColumnName)) {
-              final Filter newFilter = rewriteFilterOnUnnestColumnIfPossible(
-                  filter,
-                  inputColumn,
-                  inputColumnCapabilites
-              );
-              if (newFilter != null) {
-                preFilterList.add(newFilter);
-              }
-            } else {
-              preFilterList.add(filter);
+        // requires check for OR and And filters, disqualify rewrite for 
non-unnest filters
+        if (queryFilter instanceof BooleanFilter) {
+          int originalFilterCount = Filters.countNumberOfFilters(queryFilter);

Review Comment:
   it seems like it would be a lot more efficient to do both the before and 
after count as part of the traversal that the filter splitter does, any reason 
I'm missing not to do that? Otherwise it seems like we have to basically 
traverse the filter tree 3 times per segment, before to get a count, then to 
try to rewrite, then after to get the new count. It seems like this could all 
just be baked into the `FilterSplitter` itself. Apologies for not spotting this 
earlier on in review... but if we do this then i think we no longer need 
`Filters.countNumberOfFilters`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix filtering bug in filtering unnest cols and dim cols: Received a non-applicable rewrite (druid)

Reply via email to