[ https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715490#comment-17715490 ]
Oleksandr Porunov edited comment on TINKERPOP-2490 at 4/23/23 4:25 PM: ----------------------------------------------------------------------- This bug still exists in TinkerPop (checked version 3.6.2). An example test which shows when this behavior is not expected is below: {code:java} @Test public void testLimitedFilterNotChecksElementsOverTheLimit(){ TinkerGraph tinkerGraph = TinkerGraph.open(); List<Vertex> vertices = new ArrayList<>(); for(int i=0; i<3; i++){ Vertex v1 = tinkerGraph.addVertex(); Vertex v2 = tinkerGraph.addVertex(); v1.addEdge("connects", v2); vertices.add(v1); } tinkerGraph.traversal().V(vertices.get(0), vertices.get(1), vertices.get(2)) .where(__.out("connects").count().is(P.gte(1))).limit(1).toList(); TraversalMetrics traversalMetrics = tinkerGraph.traversal() .V(vertices.get(0), vertices.get(1), vertices.get(2)) .where(__.out("connects").count().is(P.gte(1))) .limit(1) .profile().next(); Long filterTraversalCount = traversalMetrics.getMetrics().stream().filter(metrics -> metrics.getName().startsWith(TraversalFilterStep.class.getSimpleName())) .findFirst().get().getCount(TraversalMetrics.TRAVERSER_COUNT_ID); Assertions.assertEquals(1, filterTraversalCount); } {code} In the above test the filter first checks the first provided vertex (which runs some potentially expensive check) then when limit is satisfied the check is executed again for the second provided vertex (even so it's unnecessary). The 3rd vertex is not evaluated. Thus, basically, we execute this computation always for 1 more unnecessary vertex. The above assertEquals will throw: {code:java} org.opentest4j.AssertionFailedError: Expected :1 Actual :2 {code} was (Author: porunov): This bug still exists in TinkerPop (checked version 3.6.2). An example test which shows when this behavior is not expected is below: {code:java} @Test public void testLimitedFilterNotChecksElementsOverTheLimit(){ TinkerGraph tinkerGraph = TinkerGraph.open(); List<Vertex> vertices = new ArrayList<>(); for(int i=0; i<3; i++){ Vertex v1 = tinkerGraph.addVertex(); Vertex v2 = tinkerGraph.addVertex(); v1.addEdge("connects", v2); vertices.add(v1); } tinkerGraph.traversal().V(vertices.get(0), vertices.get(1), vertices.get(2)) .where(__.out("connects").count().is(P.gte(1))).limit(1).toList(); TraversalMetrics traversalMetrics = tinkerGraph.traversal() .V(vertices.get(0), vertices.get(1), vertices.get(2)) .where(__.out("connects").count().is(P.gte(1))) .limit(1) .profile().next(); Long filterTraversalCount = traversalMetrics.getMetrics().stream().filter(metrics -> metrics.getName().startsWith(TraversalFilterStep.class.getSimpleName())) .findFirst().get().getCount(TraversalMetrics.TRAVERSER_COUNT_ID); Assertions.assertEquals(1, filterTraversalCount); } {code} In the above test the filter first checks the first provided vertex (which runs some potentially expensive check) then when limit is satisfied the check is executed again for the second provided vertex (even so it's unnecessary). The 3rd vertex is not evaluated. Thus, basically, we execute this computation always for 1 more unnecessary vertex. > RangeGlobalStep touches next traverser when high limit is already hit > --------------------------------------------------------------------- > > Key: TINKERPOP-2490 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2490 > Project: TinkerPop > Issue Type: Bug > Components: process > Affects Versions: 3.4.8 > Reporter: Guo Junshi > Priority: Major > > In FilterStep, the processNextStart() method will first retrieve next > traverser and then apply filtering logic. But for RangleGlobalStep, if high > limit is already hit, there will be no need to get next traverser. > {code:java} > @Override > protected Traverser.Admin<S> processNextStart() { > while (true) { > final Traverser.Admin<S> traverser = this.starts.next(); > if (this.filter(traverser)) > return traverser; > } > } > {code} > An example would be limit step: g.V().limit(1). This query will touch 2 > vertices although only 1 vertex will be returned. > This extra data loading will cause performance defects if DB data loading is > involved. It is not a functionality bug, but for better performance, we'd > better check high range limit first before touching next traversal. -- This message was sent by Atlassian Jira (v8.20.10#820010)