Copilot commented on code in PR #45:
URL: https://github.com/apache/asterixdb/pull/45#discussion_r2997985806
##########
asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/pushdown/processor/ColumnFilterPushdownProcessor.java:
##########
@@ -159,10 +159,51 @@ protected boolean pushdownFilter(ScanDefineDescriptor
scanDescriptor) throws Alg
protected void putFilterInformation(ScanDefineDescriptor
scanDefineDescriptor, ILogicalExpression inlinedExpr)
throws AlgebricksException {
if (checkerVisitor.containsMultipleArrayPaths(paths.values())) {
- // Cannot pushdown a filter with multiple unnest
- // TODO allow rewindable column readers for filters
- // TODO this is a bit conservative (maybe too conservative) as we
can push part of expression down
- return;
+ // Conservative fallback: attempt to push a subset of top-level
AND operands
+ // that do not collectively introduce multiple array paths.
+ if (inlinedExpr instanceof AbstractFunctionCallExpression
+ &&
BuiltinFunctions.AND.equals(((AbstractFunctionCallExpression)
inlinedExpr).getFunctionIdentifier())) {
+ List<Mutable<ILogicalExpression>> args =
((AbstractFunctionCallExpression) inlinedExpr)
+ .getArguments();
+ List<Mutable<ILogicalExpression>> selectedArgs = new
ArrayList<>();
+ List<ARecordType> selectedPaths = new ArrayList<>();
+
+ for (Mutable<ILogicalExpression> argRef : args) {
+ ILogicalExpression arg = argRef.getValue();
+ List<ARecordType> argPaths =
collectPathsForExpression(arg);
+ // Test if adding this arg's paths would cause multiple
array paths
+ checkerVisitor.beforeVisit();
+ List<ARecordType> merged = new ArrayList<>(selectedPaths);
+ merged.addAll(argPaths);
+ if (!checkerVisitor.containsMultipleArrayPaths(merged)) {
+ selectedArgs.add(new MutableObject<>(arg));
+ selectedPaths.addAll(argPaths);
+ }
+ }
+
+ if (selectedArgs.isEmpty()) {
+ // nothing pushable
+ return;
+ }
+
+ // Build new inlined expression from selectedArgs
+ AbstractFunctionCallExpression newExpr;
+ if (selectedArgs.size() == 1) {
+ // single argument
+ ILogicalExpression single = selectedArgs.get(0).getValue();
+ putFilterInformation(scanDefineDescriptor, single);
+ return;
Review Comment:
When `selectedArgs.size() == 1`, this calls
`putFilterInformation(scanDefineDescriptor, single)` recursively while leaving
`paths` unchanged. Since `paths.values()` still contains the full set of paths
(including the ones that triggered `containsMultipleArrayPaths`), the recursive
call will typically hit the same `containsMultipleArrayPaths(...)` branch and
then return early (because `single` is not an AND), resulting in *no* predicate
being pushed down. Instead, avoid recursion here: set `inlinedExpr` to the
single selected predicate and also restrict `paths` to only the entries needed
by that selected predicate (and re-check/reset the visitor state if needed).
```suggestion
// Single argument: avoid recursion. Use it directly as
the inlined expression
// and restrict 'paths' to only those needed by this
predicate.
ILogicalExpression single =
selectedArgs.get(0).getValue();
inlinedExpr = single;
// Keep only entries whose ARecordType is among the
selected paths
paths.entrySet().removeIf(e ->
!selectedPaths.contains(e.getValue()));
```
##########
asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/pushdown/processor/ColumnFilterPushdownProcessor.java:
##########
@@ -159,10 +159,51 @@ protected boolean pushdownFilter(ScanDefineDescriptor
scanDescriptor) throws Alg
protected void putFilterInformation(ScanDefineDescriptor
scanDefineDescriptor, ILogicalExpression inlinedExpr)
throws AlgebricksException {
if (checkerVisitor.containsMultipleArrayPaths(paths.values())) {
- // Cannot pushdown a filter with multiple unnest
- // TODO allow rewindable column readers for filters
- // TODO this is a bit conservative (maybe too conservative) as we
can push part of expression down
- return;
+ // Conservative fallback: attempt to push a subset of top-level
AND operands
+ // that do not collectively introduce multiple array paths.
+ if (inlinedExpr instanceof AbstractFunctionCallExpression
+ &&
BuiltinFunctions.AND.equals(((AbstractFunctionCallExpression)
inlinedExpr).getFunctionIdentifier())) {
+ List<Mutable<ILogicalExpression>> args =
((AbstractFunctionCallExpression) inlinedExpr)
+ .getArguments();
+ List<Mutable<ILogicalExpression>> selectedArgs = new
ArrayList<>();
+ List<ARecordType> selectedPaths = new ArrayList<>();
+
+ for (Mutable<ILogicalExpression> argRef : args) {
+ ILogicalExpression arg = argRef.getValue();
+ List<ARecordType> argPaths =
collectPathsForExpression(arg);
+ // Test if adding this arg's paths would cause multiple
array paths
+ checkerVisitor.beforeVisit();
+ List<ARecordType> merged = new ArrayList<>(selectedPaths);
+ merged.addAll(argPaths);
+ if (!checkerVisitor.containsMultipleArrayPaths(merged)) {
+ selectedArgs.add(new MutableObject<>(arg));
+ selectedPaths.addAll(argPaths);
+ }
Review Comment:
The subset-selection logic checks `containsMultipleArrayPaths(merged)` using
only `selectedPaths` + `argPaths`, but it doesn't include any paths that may
already have been pushed down earlier for this scan (i.e.,
`scanDefineDescriptor.getFilterPaths()` from prior `putFilterInformation`
calls). Since `paths` is accumulated across multiple candidate filters during
`pushdownFilter(...)`, this can still select a conjunct that introduces a
second array path when combined with previously accepted predicates, defeating
the constraint you’re trying to preserve. Consider seeding `selectedPaths` (or
the `merged` list) with the already-pushed paths from the descriptor before
evaluating each new conjunct.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]