Hello,

Many many moons ago, back when Faunus was a thing, Matthias Bröcheler said -- 
"Can you make it so we can infer which edge labels are needed for the OLAP job 
so I can create a Titan-speciifc push down predicate?"

Well, that was Gremlin 2.x and traversal introspection in those days was not as 
easy as it is with Gremlin 3.x. In order to satiate Matthias' (dying) wish, I 
created GraphFilterStrategy (now in master/ slated for 3.2.1). 

        
https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/strategy/optimization/GraphFilterStrategy.java

In 3.2.0, we introduced the concept of a GraphFilter which allows the user to 
specify a push-down predicate for selecting a subgraph needed for an OLAP job.

        
https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/GraphFilter.java

In short, you can do things like:

g = graph.traversal().withComputer(compute.edges(bothE().limit(0)))
g.V().count()
g.V().has("age",gt(30)).count()
…

In the example above, if your traversals won't touch edges, then filter out the 
edges. With GraphFilterStrategy, the GraphFilter no longer needs to be 
specified by the user and in fact, a GraphFilter is automatically generated 
based on traversal introspection.

        
https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/strategy/optimization/GraphFilterStrategy.java#L110-L111

Now you can just do:

g = graph.traversal().withComputer()
g.V().count()
g.V().has("age",gt(30)).count()
…

…and each traversal will have a custom created GraphFilter.

For providers that have low-level support for GraphFilter, this can be huge as 
only slices of the graph that will actually be touched by a traversal will be 
selected from the source graph reducing the amount of data transferred around 
the cluster, the amount of data held in memory, the amount of data needed to be 
GC inspected, etc.

Enjoy,
Marko.

http://markorodriguez.com

Reply via email to