[jira] [Updated] (TINKERPOP-2753) Create noop() step to avoid eager optimization

Boxuan Li (Jira) Sun, 12 Jun 2022 10:33:08 -0700


     [ 
https://issues.apache.org/jira/browse/TINKERPOP-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Boxuan Li updated TINKERPOP-2753:
---------------------------------
    Description: 
I only have experience in JanusGraph, so my opinion might be biased and this 
proposal might not be generalizable to other graph providers:

I propose we create a `noop()` step that does nothing. It is a special step 
that simply provides a hint for the graph provider. How to interpret it depends 
on the graph provider, but the usage in my mind is to avoid eager optimization. 
Sometimes a graph provider can combine different filter steps into a joint 
condition for better index selection or predicate pushdown. For example, in the 
query below:

```

g.V().has("name", "bob").has("age", 20)

```

JanusGraph will fold the two `has` conditions into a joint condition for better 
index selection. Sometimes, however, users don't want this "eager 
optimization", likely because they know the distribution of data and prefer 
doing in-memory filtering for the second `has` condition. They could do this:

```java

g.V().has("name", "bob").map(x -> x.get()).has("age", 20)

```

So that JanusGraph will defer the evaluation of the second condition until the 
first `has` condition is evaluated. Here, the `map(x -> x.get())` is 
essentially a noop step. What I am proposing is to use an official `noop()` 
step to replace this workaround. This `noop` step sounds like a `barrier` step 
but they do not have the same semantics. The `noop` step is a barrier against 
constraint look-ahead optimization.

 

Another example usage of `noop` is as follows:

```java

g.V(ids).bothE("follows").noop().where(__.otherV().is(v2)).next()

```

In the above case, we can use `noop` to force the graph provider to compute 
`bothE` first and then evaluate `where` statement. Otherwise, the graph 
provider (for example, JanusGraph) will try folding the `where` condition into 
the `bothE` step for predicate pushdown. Predicate pushdown usually works, but 
in some scenarios, it is less preferred.

 

I am happy to provide a patch if the community likes this idea.

  was:
I only have experience in JanusGraph, so my opinion might be biased and this 
proposal might not be generalizable to other graph providers:

I propose we create a `noop()` step that does nothing. It is a special step 
that simply provides a hint for the graph provider. How to interpret it depends 
on the graph provider, but the usage in my mind is to avoid eager optimization. 
Sometimes a graph provider can combine different filter steps into a joint 
condition for better index selection or predicate pushdown. For example, in the 
query below:

```java

g.V().has("name", "bob").has("age", 20)

```

JanusGraph will fold the two `has` conditions into a joint condition for better 
index selection. Sometimes, however, users don't want this "eager 
optimization", likely because they know the distribution of data and prefer 
doing in-memory filtering for the second `has` condition. They could do this:

```java

g.V().has("name", "bob").map(x -> x.get()).has("age", 20)

```

So that JanusGraph will defer the evaluation of the second condition until the 
first `has` condition is evaluated. Here, the `map(x -> x.get())` is 
essentially a noop step. What I am proposing is to use an official `noop()` 
step to replace this workaround. This `noop` step sounds like a `barrier` step 
but they do not have the same semantics. The `noop` step is a barrier against 
constraint look-ahead optimization.

 

Another example usage of `noop` is as follows:

```java

g.V(ids).bothE("follows").noop().where(__.otherV().is(v2)).next()

```

In the above case, we can use `noop` to force the graph provider to compute 
`bothE` first and then evaluate `where` statement. Otherwise, the graph 
provider (for example, JanusGraph) will try folding the `where` condition into 
the `bothE` step for predicate pushdown. Predicate pushdown usually works, but 
in some scenarios, it is less preferred.

 

I am happy to provide a patch if the community likes this idea.


> Create noop() step to avoid eager optimization
> ----------------------------------------------
>
>                 Key: TINKERPOP-2753
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2753
>             Project: TinkerPop
>          Issue Type: New Feature
>            Reporter: Boxuan Li
>            Priority: Major
>
> I only have experience in JanusGraph, so my opinion might be biased and this 
> proposal might not be generalizable to other graph providers:
> I propose we create a `noop()` step that does nothing. It is a special step 
> that simply provides a hint for the graph provider. How to interpret it 
> depends on the graph provider, but the usage in my mind is to avoid eager 
> optimization. Sometimes a graph provider can combine different filter steps 
> into a joint condition for better index selection or predicate pushdown. For 
> example, in the query below:
> ```
> g.V().has("name", "bob").has("age", 20)
> ```
> JanusGraph will fold the two `has` conditions into a joint condition for 
> better index selection. Sometimes, however, users don't want this "eager 
> optimization", likely because they know the distribution of data and prefer 
> doing in-memory filtering for the second `has` condition. They could do this:
> ```java
> g.V().has("name", "bob").map(x -> x.get()).has("age", 20)
> ```
> So that JanusGraph will defer the evaluation of the second condition until 
> the first `has` condition is evaluated. Here, the `map(x -> x.get())` is 
> essentially a noop step. What I am proposing is to use an official `noop()` 
> step to replace this workaround. This `noop` step sounds like a `barrier` 
> step but they do not have the same semantics. The `noop` step is a barrier 
> against constraint look-ahead optimization.
>  
> Another example usage of `noop` is as follows:
> ```java
> g.V(ids).bothE("follows").noop().where(__.otherV().is(v2)).next()
> ```
> In the above case, we can use `noop` to force the graph provider to compute 
> `bothE` first and then evaluate `where` statement. Otherwise, the graph 
> provider (for example, JanusGraph) will try folding the `where` condition 
> into the `bothE` step for predicate pushdown. Predicate pushdown usually 
> works, but in some scenarios, it is less preferred.
>  
> I am happy to provide a patch if the community likes this idea.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (TINKERPOP-2753) Create noop() step to avoid eager optimization

Reply via email to