imbajin commented on code in PR #3034:
URL: https://github.com/apache/hugegraph/pull/3034#discussion_r3281434544


##########
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/traversal/optimize/TraversalUtil.java:
##########
@@ -166,38 +170,219 @@ public static void trySetGraph(Step<?, ?> step, 
HugeGraph graph) {
 
     public static void extractHasContainer(HugeGraphStep<?, ?> newStep,
                                            Traversal.Admin<?, ?> traversal) {
-        Step<?, ?> step = newStep;
-        do {
-            step = step.getNextStep();
+        Step<?, ?> step = newStep.getNextStep();
+        while (step instanceof HasStep || step instanceof NoOpBarrierStep) {
+            Step<?, ?> nextStep = step.getNextStep();
             if (step instanceof HasStep) {
                 HasContainerHolder holder = (HasContainerHolder) step;
-                for (HasContainer has : holder.getHasContainers()) {
-                    if (!GraphStep.processHasContainerIds(newStep, has)) {
-                        newStep.addHasContainer(has);
-                    }
+                if (extractHasContainers(newStep, holder)) {
+                    TraversalHelper.copyLabels(step, step.getPreviousStep(), 
false);
+                    traversal.removeStep(step);
                 }
-                TraversalHelper.copyLabels(step, step.getPreviousStep(), 
false);
-                traversal.removeStep(step);
             }
-        } while (step instanceof HasStep || step instanceof NoOpBarrierStep);
+            step = nextStep;
+        }
     }
 
     public static void extractHasContainer(HugeVertexStep<?> newStep,
                                            Traversal.Admin<?, ?> traversal) {
         Step<?, ?> step = newStep;
         do {
+            Step<?, ?> nextStep = step.getNextStep();
             if (step instanceof HasStep) {
                 HasContainerHolder holder = (HasContainerHolder) step;
-                for (HasContainer has : holder.getHasContainers()) {
-                    newStep.addHasContainer(has);
+                if (extractHasContainers(newStep, holder)) {
+                    TraversalHelper.copyLabels(step, step.getPreviousStep(), 
false);
+                    traversal.removeStep(step);
                 }
-                TraversalHelper.copyLabels(step, step.getPreviousStep(), 
false);
-                traversal.removeStep(step);
             }
-            step = step.getNextStep();
+            step = nextStep;
         } while (step instanceof HasStep || step instanceof NoOpBarrierStep);
     }
 
+    private static boolean extractHasContainers(HugeGraphStep<?, ?> newStep,
+                                                HasContainerHolder holder) {
+        HugeGraph graph = TraversalUtil.tryGetGraph(newStep);
+        List<HasContainer> hasContainers = new 
ArrayList<>(holder.getHasContainers());
+        HugeType type = newStep.returnsVertex() ? HugeType.VERTEX : 
HugeType.EDGE;
+        Set<Id> labelIds = collectLabelIds(graph, type, 
newStep.getHasContainers(),
+                                           hasContainers);
+        for (HasContainer has : hasContainers) {
+            if (!canExtractHasContainer(graph, type, labelIds, has)) {
+                continue;
+            }
+            if (!GraphStep.processHasContainerIds(newStep, has)) {
+                newStep.addHasContainer(has);
+            }
+            holder.removeHasContainer(has);

Review Comment:
   ‼️ I think we should scope this PR back down before merging. The original 
bug is a focused correctness issue: text range predicates such as `has("vp4", 
P.lt(""))` should not be pushed into the backend query path. The latest 
implementation now goes further and splits a mixed `HasStep`, pushes selected 
containers down, removes them from the original `HasStep`, and adds new 
label/index-availability logic in `TraversalUtil`.
   
   ```text
   Original bug boundary
   
   Text range predicate
           |
           v
   wrongly pushed to backend
           |
           v
   exception / mismatch with match()
   ```
   
   ```text
   Latest implementation boundary
   
   HasStep(vp4 < "", age = 1)
           |
           +-- push age = 1 into HugeGraphStep / HugeVertexStep
           |
           +-- remove age = 1 from the original HasStep
           |
           +-- rely on new label/index checks to prove backend can enforce it
   ```
   
   That is a much broader optimizer change than the original fix, and once a 
container is removed from the original `HasStep`, correctness depends on 
`canExtractHasContainer()` exactly matching what the backend planner can really 
execute. The new tests cover several cases, but this still expands the review 
surface into composite-index matching, label context, paging/sysprop 
interactions, and future index-planner behavior.
   
   For this PR, I would prefer one of these smaller and safer shapes:
   
   ```text
   Option 1: minimal correctness fix
     - keep unsafe Text + range predicates in the traversal layer
     - do not split mixed HasSteps
     - avoid adding backend index-availability logic here
   
   Option 2: low-risk partial prefiltering
     - optionally push safe containers down as redundant backend pre-filters
     - keep the original HasStep unchanged unless all containers are extractable
     - correctness remains guarded by the full traversal-layer filter
   ```
   
   ```text
   Safer partial prefiltering shape
   
   Backend prefilter:
     age = 1
   
   Traversal filter still intact:
     vp4 < "" AND age = 1
   
   Result:
     backend may reduce candidates, but traversal still owns correctness
   ```
   
   The current split-and-remove optimization can still be valuable, but I think 
it should be a separate follow-up PR with dedicated tests for composite-index 
prefix matching, label-less queries, `~page` / paging semantics when some user 
filters remain in traversal, and parity with the backend index planner. Keeping 
this PR minimal would make the #2935 fix easier to reason about and safer to 
merge.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to