[jira] [Commented] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit

Oleksandr Porunov (Jira) Sun, 23 Apr 2023 16:08:06 -0700


    [ 
https://issues.apache.org/jira/browse/TINKERPOP-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715539#comment-17715539
 ]


Oleksandr Porunov commented on TINKERPOP-2490:
----------------------------------------------

Maybe I'm missing something, but I doubt that it's `CountStrategy. Even without 
`count` step at all (let's replace it with `has` step for example) the behavior 
will be the same. `has` step will be executed for 2 vertices even so, `limit` 
step says that `1` is enough. Could be I'm missing some logic behind those step 
usages, but in my understanding the filter query should be executed for a 
single vertex if that vertex matches the filter and the limit is reached.

I.e. `V(v1, v2, v3) .where(${some_filter_which_is_always_true}).limit(1)` in my 
testing `some_filter_which_is_always_true` is executed 2 times, but I would 
expect it to be executed only once because limit says that we don't need more 
elements then 1.

As for count query above, notice that I use `.gte` and not `gt`. Thus it should 
be grater or equal and thus `1` should satisfy the requirements. I don't see a 
point of checking a second element to find out that the amount of elements is 
more if equal to 1 satisfies the requirements as well. Nevertheless, my issue 
isn't in the Count query, but in the filter step which is executed more than 
needed (from my point of view).

Here is a simplified version of the test:
{code:java}
@Test
public void testLimitedFilterNotChecksElementsOverTheLimit(){
    TinkerGraph tinkerGraph = TinkerGraph.open();
    List<Vertex> vertices = new ArrayList<>();
    for(int i=0; i<3; i++){
        Vertex v1 = tinkerGraph.addVertex();
        Vertex v2 = tinkerGraph.addVertex();
        v1.addEdge("connects", v2);
        vertices.add(v1);
    }

    TraversalMetrics traversalMetrics = tinkerGraph.traversal()
        .V(vertices.get(0), vertices.get(1), vertices.get(2))
        .where(__.inject(true))
        .limit(1)
        .profile().next();

    Long filterTraversalCount = 
traversalMetrics.getMetrics().stream().filter(metrics -> 
metrics.getName().startsWith(TraversalFilterStep.class.getSimpleName()))
        .findFirst().get().getCount(TraversalMetrics.TRAVERSER_COUNT_ID);

    Assertions.assertEquals(1, filterTraversalCount);
} {code}
 

> RangeGlobalStep touches next traverser when high limit is already hit
> ---------------------------------------------------------------------
>
>                 Key: TINKERPOP-2490
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2490
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: process
>    Affects Versions: 3.4.8
>            Reporter: Guo Junshi
>            Priority: Major
>
> In FilterStep, the processNextStart() method will first retrieve next 
> traverser and then apply filtering logic. But for RangleGlobalStep, if high 
> limit is already hit, there will be no need to get next traverser.
> {code:java}
> @Override
> protected Traverser.Admin<S> processNextStart() {
>     while (true) {
>         final Traverser.Admin<S> traverser = this.starts.next();
>         if (this.filter(traverser))
>             return traverser;
>     }
> }
> {code}
> An example would be limit step: g.V().limit(1). This query will touch 2 
> vertices although only 1 vertex will be returned.
> This extra data loading will cause performance defects if DB data loading is 
> involved. It is not a functionality bug, but for better performance, we'd 
> better check high range limit first before touching next traversal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (TINKERPOP-2490) RangeGlobalStep touches next traverser when high limit is already hit

Reply via email to