gianm commented on code in PR #13922:
URL: https://github.com/apache/druid/pull/13922#discussion_r1132987409


##########
processing/src/main/java/org/apache/druid/segment/UnnestColumnValueSelectorCursor.java:
##########
@@ -310,12 +314,15 @@ private void getNextRow()
   private void initialize()

Review Comment:
   The javadoc on this method seems out of date (it refers to `allowList`).



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -293,13 +293,16 @@ void add(@Nullable final Filter filter)
         }
 
         if (requiredColumns.contains(outputColumnName)) {
-          // Try to move filter pre-correlate if possible.
+          // Rewrite filter post-unnest if possible.
           final Filter newFilter = 
rewriteFilterOnUnnestColumnIfPossible(filter, inputColumn, 
inputColumnCapabilites);
           if (newFilter != null) {
+            // Add the rewritten filter pre-unnest, so we get the benefit of 
any indexes, and so we avoid unnesting
+            // any rows that do not match this filter at all.
             preFilters.add(newFilter);
-          } else {
-            postFilters.add(filter);
           }
+          // This is needed as a filter on an MV String Dimension returns the 
entire row matching the filter

Review Comment:
   fwiw, it's not just about MV string columns. When we support doing this for 
arrays, the same thing applies to arrays. The pre-unnest filter is an 
`array_contains` and the post-unnest filter is a regular `=`.



##########
processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java:
##########
@@ -138,13 +135,16 @@ public Sequence<Cursor> makeCursors(
                 retVal.getColumnSelectorFactory(),
                 unnestColumn,
                 outputColumnName,
-                allowSet
+                filterPair.rhs
             );
           }
+          // This is needed at this moment for nested queries
+          // Future developer would want to move the virtual columns
+          // inside the UnnestCursor and wrap the columnSelectorFactory
           return PostJoinCursor.wrap(
               retVal,
               virtualColumns,
-              filterPair.rhs
+              null

Review Comment:
   This isn't right, since post-unnest filters may refer to virtual columns. So 
anything from `filterPair.rhs` that refers to a virtual column needs to be 
placed here, or else it won't properly see them.
   
   What is the rationale for moving the filter into the 
UnnestColumnValueSelectorCursor? It seems like the logic there is doing 
something similar to what PostJoinCursor does: creating a ValueMatcher that 
wraps the column selector factory. I wonder what's wrong with letting 
PostJoinCursor do that, which would keep the code in 
UnnestColumnValueSelectorCursor simpler.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to