Boxuan Li created TINKERPOP-2753:
------------------------------------
Summary: Create noop() step to avoid eager optimization
Key: TINKERPOP-2753
URL: https://issues.apache.org/jira/browse/TINKERPOP-2753
Project: TinkerPop
Issue Type: New Feature
Reporter: Boxuan Li
I only have experience in JanusGraph, so my opinion might be biased and this
proposal might not be generalizable to other graph providers:
I propose we create a `noop()` step that does nothing. It is a special step
that simply provides a hint for the graph provider. How to interpret it depends
on the graph provider, but the usage in my mind is to avoid eager optimization.
Sometimes a graph provider can combine different filter steps into a joint
condition for better index selection or predicate pushdown. For example, in the
query below:
```java
g.V().has("name", "bob").has("age", 20)
```
JanusGraph will fold the two `has` conditions into a joint condition for better
index selection. Sometimes, however, users don't want this "eager
optimization", likely because they know the distribution of data and prefer
doing in-memory filtering for the second `has` condition. They could do this:
```java
g.V().has("name", "bob").map(x -> x.get()).has("age", 20)
```
So that JanusGraph will defer the evaluation of the second condition until the
first `has` condition is evaluated. Here, the `map(x -> x.get())` is
essentially a noop step. What I am proposing is to use an official `noop()`
step to replace this workaround. This `noop` step sounds like a `barrier` step
but they do not have the same semantics. The `noop` step is a barrier against
constraint look-ahead optimization.
Another example usage of `noop` is as follows:
```java
g.V(ids).bothE("follows").noop().where(__.otherV().is(v2)).next()
```
In the above case, we can use `noop` to force the graph provider to compute
`bothE` first and then evaluate `where` statement. Otherwise, the graph
provider (for example, JanusGraph) will try folding the `where` condition into
the `bothE` step for predicate pushdown. Predicate pushdown usually works, but
in some scenarios, it is less preferred.
I am happy to provide a patch if the community likes this idea.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)