Repository: tinkerpop Updated Branches: refs/heads/master f88ace142 -> 0c2fc21fb
TINKERPOP-1967 Added connectedComponent() step Deprecated the recipe for "Connected Components" but left the old content present as I felt it had educational value. Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/0acceb90 Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/0acceb90 Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/0acceb90 Branch: refs/heads/master Commit: 0acceb90efc6516df20586d756765f94efc6cbf2 Parents: f88ace1 Author: Stephen Mallette <[email protected]> Authored: Thu May 17 14:44:01 2018 -0400 Committer: Stephen Mallette <[email protected]> Committed: Thu Aug 9 10:54:40 2018 -0400 ---------------------------------------------------------------------- CHANGELOG.asciidoc | 1 + docs/src/recipes/connected-components.asciidoc | 8 +- docs/src/reference/the-graphcomputer.asciidoc | 6 + docs/src/reference/the-traversal.asciidoc | 35 +++ docs/src/upgrade/release-3.4.x.asciidoc | 32 +++ .../tinkerpop/gremlin/jsr223/CoreImports.java | 4 + .../ConnectedComponentVertexProgram.java | 234 +++++++++++++++++++ .../traversal/step/map/ConnectedComponent.java | 38 +++ .../ConnectedComponentVertexProgramStep.java | 98 ++++++++ .../gremlin/process/remote/RemoteGraph.java | 4 + .../traversal/dsl/graph/GraphTraversal.java | 15 ++ .../traversal/dsl/graph/GraphTraversalTest.java | 2 +- .../Process/Traversal/GraphTraversal.cs | 9 + .../lib/process/graph-traversal.js | 10 + .../gremlin_python/process/graph_traversal.py | 4 + .../gremlin/process/ProcessComputerSuite.java | 2 + .../step/map/ConnectedComponentTest.java | 117 ++++++++++ .../computer/SparkHadoopGraphProvider.java | 2 + 18 files changed, 619 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/CHANGELOG.asciidoc ---------------------------------------------------------------------- diff --git a/CHANGELOG.asciidoc b/CHANGELOG.asciidoc index 303d963..481a7d3 100644 --- a/CHANGELOG.asciidoc +++ b/CHANGELOG.asciidoc @@ -35,6 +35,7 @@ This release also includes changes from <<release-3-3-3, 3.3.3>>. * Replaced `Parameterizing.addPropertyMutations()` with `Configuring.configure()`. * Changed interface hierarchy for `Parameterizing` and `Mutating` interfaces as they are tightly related. * Introduced the `with()` step modulator which can supply configuration options to `Configuring` steps. +* Added `connectedComponent()` step and related `VertexProgram`. * Added `supportsUpsert()` option to `VertexFeatures` and `EdgeFeatures`. * `min()` and `max()` now support all types implementing `Comparable`. * Change the `toString()` of `Path` to be standardized as other graph elements are. http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/docs/src/recipes/connected-components.asciidoc ---------------------------------------------------------------------- diff --git a/docs/src/recipes/connected-components.asciidoc b/docs/src/recipes/connected-components.asciidoc index 46d61eb..70abdbd 100644 --- a/docs/src/recipes/connected-components.asciidoc +++ b/docs/src/recipes/connected-components.asciidoc @@ -18,7 +18,13 @@ limitations under the License. == Connected Components Gremlin can be used to find link:https://en.wikipedia.org/wiki/Connected_component_(graph_theory)[connected components] -in a graph. Consider the following graph which has three connected components: +in a graph. As of TinkerPop 3.4.0, the process has been simplified to the `connectedComponent()`-step which is +described in more detail in the link: +link:http://tinkerpop.apache.org/docs/x.y.z/reference/#connectedcomponent-step[Reference Documentation]. + +The `connectedComponent()`-step replaces the original recipe described below from earlier versions of TinkerPop, +however the algorithm from that old recipe remains interesting for educational purposes and has thus not been removed. +The original recipe considers the following graph which has three connected components: image:connected-components.png[width=600] http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/docs/src/reference/the-graphcomputer.asciidoc ---------------------------------------------------------------------- diff --git a/docs/src/reference/the-graphcomputer.asciidoc b/docs/src/reference/the-graphcomputer.asciidoc index 1d7586c..9cd1c76 100644 --- a/docs/src/reference/the-graphcomputer.asciidoc +++ b/docs/src/reference/the-graphcomputer.asciidoc @@ -403,6 +403,12 @@ g.V().peerPressure().by('cluster').valueMap() g.V().peerPressure().by(outE('knows')).by('cluster').valueMap() ---- +[[connectedcomponentvertexprogram]] +=== ConnectedComponentVertexProgram + +The `ConnectedComponentVertexProgram` identifies link:https://en.wikipedia.org/wiki/Connected_component_(graph_theory)[Connected Component] +instances in a graph. See <<connectedcomponent-step,`connectedComponent()`>>-step for more information. + [[bulkdumpervertexprogram]] [[clonevertexprogram]] === CloneVertexProgram http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/docs/src/reference/the-traversal.asciidoc ---------------------------------------------------------------------- diff --git a/docs/src/reference/the-traversal.asciidoc b/docs/src/reference/the-traversal.asciidoc index 7d385c8..4d5b48a 100644 --- a/docs/src/reference/the-traversal.asciidoc +++ b/docs/src/reference/the-traversal.asciidoc @@ -625,6 +625,41 @@ g.V().coin(1.0) link:++http://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#coin-double-++[`coin(double)`] +[[connectedcomponent-step]] +=== ConnectedComponent Step + +The `connectedComponent()` step performs a computation to identify link:https://en.wikipedia.org/wiki/Connected_component_(graph_theory)[Connected Component] +instances in a graph. When this step completes, the vertices will be labelled with a component identifier to denote +the component to which they are associated. + +IMPORTANT: The `connectedComponent()`-step is a `VertexComputing`-step and as such, can only be used against a graph +that supports `GraphComputer` (OLAP). + +[gremlin-groovy,modern] +---- +g = graph.traversal().withComputer() +g.V(). + connectedComponent(). + with(ConnectedComponent.propertyName, 'component'). + project('name','component'). + by('name'). + by('component') +g.V().hasLabel('person'). + connectedComponent(). + with(ConnectedComponent.propertyName, 'component'). + with(ConnectedComponent.edges, outE('knows')). + project('name','component'). + by('name'). + by('component') +---- + +Note the use of the `with()` modulating step which provides configuration options to the algorithm. It takes +configuration keys from the `ConnectedComponent` class and is automatically imported to the Gremlin Console. + +*Additional References* + +link:++http://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#connectedComponent--++[`connectedComponent()`] + [[constant-step]] === Constant Step http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/docs/src/upgrade/release-3.4.x.asciidoc ---------------------------------------------------------------------- diff --git a/docs/src/upgrade/release-3.4.x.asciidoc b/docs/src/upgrade/release-3.4.x.asciidoc index 587e761..00f2635 100644 --- a/docs/src/upgrade/release-3.4.x.asciidoc +++ b/docs/src/upgrade/release-3.4.x.asciidoc @@ -65,6 +65,38 @@ release where breaking changes are allowed. See: link:https://issues.apache.org/jira/browse/TINKERPOP-1975[TINKERPOP-1975], link:http://tinkerpop.apache.org/docs/3.4.0/reference/#with-step[Reference Documentation] +==== connectedComponent() Step + +In prior version of TinkerPop, it was recommended that the identification of +link:https://en.wikipedia.org/wiki/Connected_component_(graph_theory)[Connected Component] instances in a graph be +computed by way of a reasonably complex bit of Gremlin that looked something like this: + +[source,groovy] +---- +g.V().emit(cyclicPath().or().not(both())).repeat(both()).until(cyclicPath()). + path().aggregate("p"). + unfold().dedup(). + map(__.as("v").select("p").unfold(). + filter(unfold().where(eq("v"))). + unfold().dedup().order().by(id).fold()). + dedup() +---- + +The above approach had a number of drawbacks that included a large execution cost as well as incompatibilities in OLAP. +To simplify usage of this commonly use graph algorithm, TinkerPop 3.4.0 introduces the `connectedComponent()` step +which reduces the above operation to: + +[source,groovy] +---- +g.withComputer().V().connectedComponent() +---- + +It is important to note that this step does require the use of a `GraphComputer` to work, as it utilizes a +`VertexProgram` behind the scenes. + +See: link:https://issues.apache.org/jira/browse/TINKERPOP-1967[TINKERPOP-1967], +link:http://tinkerpop.apache.org/docs/x.y.z/reference/#connectedcomponent-step[Reference Documentation] + ==== io() Step There have been some important changes to IO operations for reading and writing graph data. The use of `Graph.io()` http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/jsr223/CoreImports.java ---------------------------------------------------------------------- diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/jsr223/CoreImports.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/jsr223/CoreImports.java index 72ad47a..eb3b831 100644 --- a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/jsr223/CoreImports.java +++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/jsr223/CoreImports.java @@ -42,6 +42,7 @@ import org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVerte import org.apache.tinkerpop.gremlin.process.computer.bulkloading.IncrementalBulkLoader; import org.apache.tinkerpop.gremlin.process.computer.bulkloading.OneTimeBulkLoader; import org.apache.tinkerpop.gremlin.process.computer.clone.CloneVertexProgram; +import org.apache.tinkerpop.gremlin.process.computer.clustering.connected.ConnectedComponentVertexProgram; import org.apache.tinkerpop.gremlin.process.computer.clustering.peerpressure.ClusterCountMapReduce; import org.apache.tinkerpop.gremlin.process.computer.clustering.peerpressure.ClusterPopulationMapReduce; import org.apache.tinkerpop.gremlin.process.computer.clustering.peerpressure.PeerPressureVertexProgram; @@ -49,6 +50,7 @@ import org.apache.tinkerpop.gremlin.process.computer.ranking.pagerank.PageRankMa import org.apache.tinkerpop.gremlin.process.computer.ranking.pagerank.PageRankVertexProgram; import org.apache.tinkerpop.gremlin.process.computer.traversal.MemoryTraversalSideEffects; import org.apache.tinkerpop.gremlin.process.computer.traversal.TraversalVertexProgram; +import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ConnectedComponent; import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.PageRank; import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.PeerPressure; import org.apache.tinkerpop.gremlin.process.computer.traversal.strategy.decoration.VertexProgramStrategy; @@ -251,6 +253,8 @@ public final class CoreImports { // graph computer CLASS_IMPORTS.add(Computer.class); CLASS_IMPORTS.add(ComputerResult.class); + CLASS_IMPORTS.add(ConnectedComponent.class); + CLASS_IMPORTS.add(ConnectedComponentVertexProgram.class); CLASS_IMPORTS.add(GraphComputer.class); CLASS_IMPORTS.add(Memory.class); CLASS_IMPORTS.add(VertexProgram.class); http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java ---------------------------------------------------------------------- diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java new file mode 100644 index 0000000..de718f1 --- /dev/null +++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.tinkerpop.gremlin.process.computer.clustering.connected; + +import org.apache.commons.configuration.BaseConfiguration; +import org.apache.commons.configuration.Configuration; +import org.apache.commons.configuration.ConfigurationUtils; +import org.apache.tinkerpop.gremlin.process.computer.GraphComputer; +import org.apache.tinkerpop.gremlin.process.computer.Memory; +import org.apache.tinkerpop.gremlin.process.computer.MemoryComputeKey; +import org.apache.tinkerpop.gremlin.process.computer.MessageScope; +import org.apache.tinkerpop.gremlin.process.computer.Messenger; +import org.apache.tinkerpop.gremlin.process.computer.VertexComputeKey; +import org.apache.tinkerpop.gremlin.process.computer.VertexProgram; +import org.apache.tinkerpop.gremlin.process.computer.traversal.TraversalVertexProgram; +import org.apache.tinkerpop.gremlin.process.computer.util.AbstractVertexProgramBuilder; +import org.apache.tinkerpop.gremlin.process.traversal.Operator; +import org.apache.tinkerpop.gremlin.process.traversal.Traversal; +import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__; +import org.apache.tinkerpop.gremlin.process.traversal.util.PureTraversal; +import org.apache.tinkerpop.gremlin.structure.Direction; +import org.apache.tinkerpop.gremlin.structure.Edge; +import org.apache.tinkerpop.gremlin.structure.Graph; +import org.apache.tinkerpop.gremlin.structure.Vertex; +import org.apache.tinkerpop.gremlin.structure.VertexProperty; + +import java.util.Collections; +import java.util.HashSet; +import java.util.Iterator; +import java.util.Set; + +/** + * Identifies "Connected Component" instances in a graph by assigning a component identifier (the lexicographically + * least string value of the vertex in the component) to each vertex. + * + * @author Stephen Mallette (http://stephen.genoprime.com) + * @author Daniel Kuppitz (http://gremlin.guru) + */ +public class ConnectedComponentVertexProgram implements VertexProgram<String> { + + public static final String COMPONENT = "gremlin.connectedComponentVertexProgram.component"; + private static final String PROPERTY = "gremlin.connectedComponentVertexProgram.property"; + private static final String EDGE_TRAVERSAL = "gremlin.pageRankVertexProgram.edgeTraversal"; + private static final String HAS_HALTED = "gremlin.connectedComponentVertexProgram.hasHalted"; + private static final String VOTE_TO_HALT = "gremlin.connectedComponentVertexProgram.voteToHalt"; + + private static final Set<MemoryComputeKey> MEMORY_COMPUTE_KEYS = Collections.singleton(MemoryComputeKey.of(VOTE_TO_HALT, Operator.and, false, true)); + private MessageScope.Local<?> scope = MessageScope.Local.of(__::bothE); + private Set<MessageScope> scopes; + private String property = COMPONENT; + private boolean hasHalted = false; + private PureTraversal<Vertex, Edge> edgeTraversal = null; + private Configuration configuration; + + private ConnectedComponentVertexProgram() {} + + @Override + public void loadState(final Graph graph, final Configuration config) { + configuration = new BaseConfiguration(); + if (config != null) { + ConfigurationUtils.copy(config, configuration); + } + + if (configuration.containsKey(EDGE_TRAVERSAL)) { + this.edgeTraversal = PureTraversal.loadState(configuration, EDGE_TRAVERSAL, graph); + this.scope = MessageScope.Local.of(() -> this.edgeTraversal.get().clone()); + } + + scopes = new HashSet<>(Collections.singletonList(scope)); + + this.property = configuration.getString(PROPERTY, COMPONENT); + this.hasHalted = configuration.getBoolean(HAS_HALTED, false); + } + + @Override + public void storeState(final Configuration config) { + VertexProgram.super.storeState(config); + if (configuration != null) { + ConfigurationUtils.copy(configuration, config); + } + } + + @Override + public void setup(final Memory memory) { + memory.set(VOTE_TO_HALT, true); + } + + @Override + public void execute(final Vertex vertex, final Messenger<String> messenger, final Memory memory) { + if (memory.isInitialIteration()) { + // on the first pass, just initialize the component to its own id then pass it to all adjacent vertices + // for evaluation + vertex.property(VertexProperty.Cardinality.single, property, vertex.id().toString()); + + // no need to send messages from traversers that were filtered - happens for stuff like + // g.V().hasLabel('software').connectedComponent() (i.e. when there is a TraversalVertexProgram in the mix + // halting traversers. only want to send messages from traversers that are still hanging about after + // the filter. the unfiltered vertices can only react to messages sent to them. of course, this can + // lead to weirdness in results. + if (vertex.edges(Direction.BOTH).hasNext() && !(hasHalted && !vertex.property(TraversalVertexProgram.HALTED_TRAVERSERS).isPresent())) { + // since there was message passing we don't want to halt on the first round. this should only trigger + // a single pass finish if the graph is completely disconnected (technically, it won't even really + // work in cases where halted traversers come into play + messenger.sendMessage(scope, vertex.id().toString()); + memory.add(VOTE_TO_HALT, false); + } + } else { + // by the second iteration all vertices that matter should have a component assigned + String currentComponent = vertex.value(property); + boolean different = false; + + // iterate through messages received and determine if there is a component that has a lesser value than + // the currently assigned one + final Iterator<String> componentIterator = messenger.receiveMessages(); + while(componentIterator.hasNext()) { + final String candidateComponent = componentIterator.next(); + if (candidateComponent.compareTo(currentComponent) < 0) { + currentComponent = candidateComponent; + different = true; + } + } + + // if there is a better component then assign it and notify adjacent vertices. triggering the message + // passing should not halt future executions + if (different) { + vertex.property(VertexProperty.Cardinality.single, property, currentComponent); + messenger.sendMessage(scope, currentComponent); + memory.add(VOTE_TO_HALT, false); + } + } + } + + @Override + public Set<VertexComputeKey> getVertexComputeKeys() { + return new HashSet<>(Collections.singletonList(VertexComputeKey.of(property, false))); + } + + @Override + public Set<MemoryComputeKey> getMemoryComputeKeys() { + return MEMORY_COMPUTE_KEYS; + } + + @Override + public boolean terminate(final Memory memory) { + final boolean voteToHalt = memory.<Boolean>get(VOTE_TO_HALT); + if (voteToHalt) { + return true; + } else { + // it is basically always assumed that the program will want to halt, but if message passing occurs, the + // program will want to continue, thus reset false values to true for future iterations + memory.set(VOTE_TO_HALT, true); + return false; + } + } + + @Override + public Set<MessageScope> getMessageScopes(final Memory memory) { + return scopes; + } + + @Override + public GraphComputer.ResultGraph getPreferredResultGraph() { + return GraphComputer.ResultGraph.NEW; + } + + @Override + public GraphComputer.Persist getPreferredPersist() { + return GraphComputer.Persist.VERTEX_PROPERTIES; + } + + + @Override + @SuppressWarnings("CloneDoesntCallSuperClone,CloneDoesntDeclareCloneNotSupportedException") + public ConnectedComponentVertexProgram clone() { + return this; + } + + @Override + public Features getFeatures() { + return new Features() { + @Override + public boolean requiresLocalMessageScopes() { + return true; + } + + @Override + public boolean requiresVertexPropertyAddition() { + return true; + } + }; + } + + public static ConnectedComponentVertexProgram.Builder build() { + return new ConnectedComponentVertexProgram.Builder(); + } + + public static final class Builder extends AbstractVertexProgramBuilder<ConnectedComponentVertexProgram.Builder> { + + private Builder() { + super(ConnectedComponentVertexProgram.class); + } + + public ConnectedComponentVertexProgram.Builder hasHalted(final boolean hasHalted) { + this.configuration.setProperty(HAS_HALTED, hasHalted); + return this; + } + + public ConnectedComponentVertexProgram.Builder edges(final Traversal.Admin<Vertex, Edge> edgeTraversal) { + PureTraversal.storeState(this.configuration, EDGE_TRAVERSAL, edgeTraversal); + return this; + } + + public ConnectedComponentVertexProgram.Builder property(final String key) { + this.configuration.setProperty(PROPERTY, key); + return this; + } + } +} http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponent.java ---------------------------------------------------------------------- diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponent.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponent.java new file mode 100644 index 0000000..85558bc --- /dev/null +++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponent.java @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.tinkerpop.gremlin.process.computer.traversal.step.map; + +import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal; +import org.apache.tinkerpop.gremlin.structure.Graph; + +/** + * Configuration options to be passed to the {@link GraphTraversal#with(String, Object)} step on + * {@link GraphTraversal#connectedComponent()} ()}. + */ +public class ConnectedComponent { + /** + * Configures the edge to traverse when calculating the pagerank. + */ + public static final String edges = Graph.Hidden.hide("tinkerpop.connectedComponent.edges"); + + /** + * Configures the name of the property within which to store the pagerank value. + */ + public static final String propertyName = Graph.Hidden.hide("tinkerpop.connectedComponent.propertyName"); +} http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponentVertexProgramStep.java ---------------------------------------------------------------------- diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponentVertexProgramStep.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponentVertexProgramStep.java new file mode 100644 index 0000000..edeb497 --- /dev/null +++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponentVertexProgramStep.java @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.tinkerpop.gremlin.process.computer.traversal.step.map; + +import org.apache.tinkerpop.gremlin.process.computer.GraphFilter; +import org.apache.tinkerpop.gremlin.process.computer.Memory; +import org.apache.tinkerpop.gremlin.process.computer.clustering.connected.ConnectedComponentVertexProgram; +import org.apache.tinkerpop.gremlin.process.computer.traversal.TraversalVertexProgram; +import org.apache.tinkerpop.gremlin.process.traversal.Traversal; +import org.apache.tinkerpop.gremlin.process.traversal.TraversalStrategies; +import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__; +import org.apache.tinkerpop.gremlin.process.traversal.step.Configuring; +import org.apache.tinkerpop.gremlin.process.traversal.step.TraversalParent; +import org.apache.tinkerpop.gremlin.process.traversal.step.util.Parameters; +import org.apache.tinkerpop.gremlin.process.traversal.util.PureTraversal; +import org.apache.tinkerpop.gremlin.structure.Edge; +import org.apache.tinkerpop.gremlin.structure.Graph; +import org.apache.tinkerpop.gremlin.structure.Vertex; +import org.apache.tinkerpop.gremlin.structure.util.StringFactory; + +/** + * @author Stephen Mallette (http://stephen.genoprime.com) + */ +public final class ConnectedComponentVertexProgramStep extends VertexProgramStep implements TraversalParent, Configuring { + + private Parameters parameters = new Parameters(); + private PureTraversal<Vertex, Edge> edgeTraversal; + private String clusterProperty = ConnectedComponentVertexProgram.COMPONENT; + + public ConnectedComponentVertexProgramStep(final Traversal.Admin traversal) { + super(traversal); + this.configure(ConnectedComponent.edges, __.<Vertex>bothE()); + } + + @Override + public void configure(final Object... keyValues) { + if (keyValues[0].equals(ConnectedComponent.edges)) { + if (!(keyValues[1] instanceof Traversal)) + throw new IllegalArgumentException("ConnectedComponent.edges requires a Traversal as its argument"); + this.edgeTraversal = new PureTraversal<>(((Traversal<Vertex,Edge>) keyValues[1]).asAdmin()); + this.integrateChild(this.edgeTraversal.get()); + } else if (keyValues[0].equals(ConnectedComponent.propertyName)) { + if (!(keyValues[1] instanceof String)) + throw new IllegalArgumentException("ConnectedComponent.propertyName requires a String as its argument"); + this.clusterProperty = (String) keyValues[1]; + } else { + this.parameters.set(this, keyValues); + } + } + + @Override + public Parameters getParameters() { + return parameters; + } + + @Override + public int hashCode() { + return super.hashCode() ^ this.clusterProperty.hashCode(); + } + + @Override + public String toString() { + return StringFactory.stepString(this, this.clusterProperty, new GraphFilter(this.computer)); + } + + @Override + public ConnectedComponentVertexProgram generateProgram(final Graph graph, final Memory memory) { + final Traversal.Admin<Vertex, Edge> detachedTraversal = this.edgeTraversal.getPure(); + detachedTraversal.setStrategies(TraversalStrategies.GlobalCache.getStrategies(graph.getClass())); + return ConnectedComponentVertexProgram.build(). + hasHalted(memory.exists(TraversalVertexProgram.HALTED_TRAVERSERS)). + edges(detachedTraversal). + property(this.clusterProperty).create(graph); + } + + @Override + public ConnectedComponentVertexProgramStep clone() { + return (ConnectedComponentVertexProgramStep) super.clone(); + } + +} http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java ---------------------------------------------------------------------- diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java index f19d5fa..9bc1771 100644 --- a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java +++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java @@ -48,6 +48,10 @@ import java.util.Iterator; @Graph.OptIn(Graph.OptIn.SUITE_PROCESS_STANDARD) @Graph.OptIn(Graph.OptIn.SUITE_PROCESS_COMPUTER) @Graph.OptOut( + test = "org.apache.tinkerpop.gremlin.process.traversal.step.map.ConnectedComponentTest", + method = "*", + reason = "hmmmm") [email protected]( test = "org.apache.tinkerpop.gremlin.process.traversal.CoreTraversalTest", method = "*", reason = "The test suite does not support profiling or lambdas and for groovy tests: 'Could not locate method: GraphTraversalSource.withStrategies([{traversalCategory=interface org.apache.tinkerpop.gremlin.process.traversal.TraversalStrategy$DecorationStrategy}])'") http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.java ---------------------------------------------------------------------- diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.java index a675ad1..4fd247f 100644 --- a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.java +++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.java @@ -19,6 +19,8 @@ package org.apache.tinkerpop.gremlin.process.traversal.dsl.graph; import org.apache.tinkerpop.gremlin.process.computer.VertexProgram; +import org.apache.tinkerpop.gremlin.process.computer.clustering.connected.ConnectedComponentVertexProgram; +import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ConnectedComponentVertexProgramStep; import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.PageRankVertexProgramStep; import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.PeerPressureVertexProgramStep; import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ProgramVertexProgramStep; @@ -2452,6 +2454,18 @@ public interface GraphTraversal<S, E> extends Traversal<S, E> { } /** + * Executes a Connected Component algorithm over the graph. + * + * @return the traversal with the appended {@link ConnectedComponentVertexProgram} + * @see <a href="http://tinkerpop.apache.org/docs/${project.version}/reference/#connectedcomponent-step" target="_blank">Reference Documentation - ConnectedComponent Step</a> + * @since 3.4.0 + */ + public default GraphTraversal<S, E> connectedComponent() { + this.asAdmin().getBytecode().addStep(Symbols.connectedComponent); + return this.asAdmin().addStep((Step<E, E>) new ConnectedComponentVertexProgramStep(this.asAdmin())); + } + + /** * Executes an arbitrary {@link VertexProgram} over the graph. * * @return the traversal with the appended {@link ProgramVertexProgramStep} @@ -2882,6 +2896,7 @@ public interface GraphTraversal<S, E> extends Traversal<S, E> { public static final String pageRank = "pageRank"; public static final String peerPressure = "peerPressure"; + public static final String connectedComponent = "connectedComponent"; public static final String program = "program"; public static final String by = "by"; http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversalTest.java ---------------------------------------------------------------------- diff --git a/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversalTest.java b/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversalTest.java index 3d9a549..40ca312 100644 --- a/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversalTest.java +++ b/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversalTest.java @@ -43,7 +43,7 @@ import static org.junit.Assert.assertEquals; public class GraphTraversalTest { private static final Logger logger = LoggerFactory.getLogger(GraphTraversalTest.class); - private static Set<String> NO_GRAPH = new HashSet<>(Arrays.asList("asAdmin", "by", "read", "write", "with", "option", "iterate", "to", "from", "profile", "pageRank", "peerPressure", "program", "none")); + private static Set<String> NO_GRAPH = new HashSet<>(Arrays.asList("asAdmin", "by", "read", "write", "with", "option", "iterate", "to", "from", "profile", "pageRank", "connectedComponent", "peerPressure", "program", "none")); private static Set<String> NO_ANONYMOUS = new HashSet<>(Arrays.asList("start", "__")); private static Set<String> IGNORES_BYTECODE = new HashSet<>(Arrays.asList("asAdmin", "read", "write", "iterate", "mapValues", "mapKeys")); http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-dotnet/src/Gremlin.Net/Process/Traversal/GraphTraversal.cs ---------------------------------------------------------------------- diff --git a/gremlin-dotnet/src/Gremlin.Net/Process/Traversal/GraphTraversal.cs b/gremlin-dotnet/src/Gremlin.Net/Process/Traversal/GraphTraversal.cs index 361b246..e2fa3e1 100644 --- a/gremlin-dotnet/src/Gremlin.Net/Process/Traversal/GraphTraversal.cs +++ b/gremlin-dotnet/src/Gremlin.Net/Process/Traversal/GraphTraversal.cs @@ -401,6 +401,15 @@ namespace Gremlin.Net.Process.Traversal } /// <summary> + /// Adds the connectedComponent step to this <see cref="GraphTraversal{SType, EType}" />. + /// </summary> + public GraphTraversal<S, E> ConnectedComponent () + { + Bytecode.AddStep("connectedComponent"); + return Wrap<S, E>(this); + } + + /// <summary> /// Adds the constant step to this <see cref="GraphTraversal{SType, EType}" />. /// </summary> public GraphTraversal<S, E2> Constant<E2> (E2 e) http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-javascript/src/main/javascript/gremlin-javascript/lib/process/graph-traversal.js ---------------------------------------------------------------------- diff --git a/gremlin-javascript/src/main/javascript/gremlin-javascript/lib/process/graph-traversal.js b/gremlin-javascript/src/main/javascript/gremlin-javascript/lib/process/graph-traversal.js index 4f39fa5..1c8d639 100644 --- a/gremlin-javascript/src/main/javascript/gremlin-javascript/lib/process/graph-traversal.js +++ b/gremlin-javascript/src/main/javascript/gremlin-javascript/lib/process/graph-traversal.js @@ -353,6 +353,16 @@ class GraphTraversal extends Traversal { } /** + * Graph traversal connectedComponent method. + * @param {...Object} args + * @returns {GraphTraversal} + */ + connectedComponent(...args) { + this.bytecode.addStep('connectedComponent', args); + return this; + } + + /** * Graph traversal constant method. * @param {...Object} args * @returns {GraphTraversal} http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py ---------------------------------------------------------------------- diff --git a/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py b/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py index f8b8285..b073aa8 100644 --- a/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py +++ b/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py @@ -194,6 +194,10 @@ class GraphTraversal(Traversal): self.bytecode.add_step("coin", *args) return self + def connectedComponent(self, *args): + self.bytecode.add_step("connectedComponent", *args) + return self + def constant(self, *args): self.bytecode.add_step("constant", *args) return self http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/ProcessComputerSuite.java ---------------------------------------------------------------------- diff --git a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/ProcessComputerSuite.java b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/ProcessComputerSuite.java index eab562d..6466ae8 100644 --- a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/ProcessComputerSuite.java +++ b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/ProcessComputerSuite.java @@ -50,6 +50,7 @@ import org.apache.tinkerpop.gremlin.process.traversal.step.filter.TailTest; import org.apache.tinkerpop.gremlin.process.traversal.step.filter.WhereTest; import org.apache.tinkerpop.gremlin.process.traversal.step.map.AddEdgeTest; import org.apache.tinkerpop.gremlin.process.traversal.step.map.CoalesceTest; +import org.apache.tinkerpop.gremlin.process.traversal.step.map.ConnectedComponentTest; import org.apache.tinkerpop.gremlin.process.traversal.step.map.ConstantTest; import org.apache.tinkerpop.gremlin.process.traversal.step.map.CountTest; import org.apache.tinkerpop.gremlin.process.traversal.step.map.FlatMapTest; @@ -143,6 +144,7 @@ public class ProcessComputerSuite extends AbstractGremlinSuite { // map CoalesceTest.Traversals.class, + ConnectedComponentTest.Traversals.class, ConstantTest.Traversals.class, CountTest.Traversals.class, FlatMapTest.Traversals.class, http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java ---------------------------------------------------------------------- diff --git a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java new file mode 100644 index 0000000..25e618a --- /dev/null +++ b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.tinkerpop.gremlin.process.traversal.step.map; + +import org.apache.tinkerpop.gremlin.LoadGraphWith; +import org.apache.tinkerpop.gremlin.process.computer.clustering.connected.ConnectedComponentVertexProgram; +import org.apache.tinkerpop.gremlin.process.AbstractGremlinProcessTest; +import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ConnectedComponent; +import org.apache.tinkerpop.gremlin.process.traversal.Traversal; +import org.apache.tinkerpop.gremlin.structure.Vertex; +import org.junit.Test; + +import static org.apache.tinkerpop.gremlin.LoadGraphWith.GraphData.MODERN; +import static org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__.bothE; +import static org.junit.Assert.assertEquals; + +/** + * @author Stephen Mallette (http://stephen.genoprime.com) + */ +public abstract class ConnectedComponentTest extends AbstractGremlinProcessTest { + + public abstract Traversal<Vertex, Vertex> get_g_V_connectedComponent(); + + public abstract Traversal<Vertex, Vertex> get_g_V_hasLabelXsoftwareX_connectedComponent(); + + public abstract Traversal<Vertex, Vertex> get_g_V_connectedComponent_withXedges_bothEXknowsXX_withXpropertyName_clusterX(); + + @Test + @LoadGraphWith(MODERN) + public void g_V_connectedComponent() { + final Traversal<Vertex, Vertex> traversal = get_g_V_connectedComponent(); + printTraversalForm(traversal); + int counter = 0; + while (traversal.hasNext()) { + final Vertex vertex = traversal.next(); + counter++; + assertEquals("1", vertex.value(ConnectedComponentVertexProgram.COMPONENT)); + } + assertEquals(6, counter); + } + + @Test + @LoadGraphWith(MODERN) + public void g_V_hasLabelXsoftwareX_connectedComponent() { + final Traversal<Vertex, Vertex> traversal = get_g_V_hasLabelXsoftwareX_connectedComponent(); + printTraversalForm(traversal); + int counter = 0; + while (traversal.hasNext()) { + final Vertex vertex = traversal.next(); + final String name = vertex.value("name"); + switch (name) { + case "lop": + case "ripple": + assertEquals("3", vertex.value(ConnectedComponentVertexProgram.COMPONENT)); + break; + } + counter++; + } + assertEquals(2, counter); + } + + @Test + @LoadGraphWith(MODERN) + public void g_V_connectedComponent_withXEDGES_bothEXknowsXX_withXPROPERTY_NAME_clusterX() { + final Traversal<Vertex, Vertex> traversal = get_g_V_connectedComponent_withXedges_bothEXknowsXX_withXpropertyName_clusterX(); + printTraversalForm(traversal); + int counter = 0; + while (traversal.hasNext()) { + final Vertex vertex = traversal.next(); + final String name = vertex.value("name"); + switch (name) { + case "marko": + case "vadas": + case "josh": + assertEquals("1", vertex.value("cluster")); + break; + case "peter": + assertEquals("6", vertex.value("cluster")); + break; + } + counter++; + } + assertEquals(4, counter); + } + + public static class Traversals extends ConnectedComponentTest { + @Override + public Traversal<Vertex, Vertex> get_g_V_connectedComponent() { + return g.V().connectedComponent(); + } + + @Override + public Traversal<Vertex, Vertex> get_g_V_hasLabelXsoftwareX_connectedComponent() { + return g.V().hasLabel("software").connectedComponent(); + } + @Override + public Traversal<Vertex, Vertex> get_g_V_connectedComponent_withXedges_bothEXknowsXX_withXpropertyName_clusterX() { + return g.V().hasLabel("person").connectedComponent().with(ConnectedComponent.edges, bothE("knows")).with(ConnectedComponent.propertyName, "cluster"); + } + } +} http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/SparkHadoopGraphProvider.java ---------------------------------------------------------------------- diff --git a/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/SparkHadoopGraphProvider.java b/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/SparkHadoopGraphProvider.java index c778c6d..2f727c8 100644 --- a/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/SparkHadoopGraphProvider.java +++ b/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/SparkHadoopGraphProvider.java @@ -39,6 +39,7 @@ import org.apache.tinkerpop.gremlin.process.computer.Computer; import org.apache.tinkerpop.gremlin.process.computer.GraphComputer; import org.apache.tinkerpop.gremlin.process.computer.util.ComputerGraph; import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource; +import org.apache.tinkerpop.gremlin.process.traversal.step.map.ConnectedComponentTest; import org.apache.tinkerpop.gremlin.process.traversal.step.map.PageRankTest; import org.apache.tinkerpop.gremlin.process.traversal.step.map.PeerPressureTest; import org.apache.tinkerpop.gremlin.process.traversal.step.map.ProgramTest; @@ -115,6 +116,7 @@ public class SparkHadoopGraphProvider extends AbstractFileGraphProvider { if (null != loadGraphWith && !test.equals(ProgramTest.Traversals.class) && !test.equals(PageRankTest.Traversals.class) && + !test.equals(ConnectedComponentTest.Traversals.class) && !test.equals(PeerPressureTest.Traversals.class) && !test.equals(FileSystemStorageCheck.class) && !testMethodName.equals("shouldSupportJobChaining") && // GraphComputerTest.shouldSupportJobChaining
