TINKERPOP-1996 Added docs for io() Killed all the old IO documentation that utilized the GraphReader/Writer classes directly as well as the Graph.io() method that is now deprecated.
Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/62175c22 Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/62175c22 Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/62175c22 Branch: refs/heads/TINKERPOP-1990 Commit: 62175c228b77bdbda96c11015f2974828df8f3aa Parents: 576649f Author: Stephen Mallette <sp...@genoprime.com> Authored: Fri Jul 13 17:31:46 2018 -0400 Committer: Stephen Mallette <sp...@genoprime.com> Committed: Thu Jul 19 13:40:10 2018 -0400 ---------------------------------------------------------------------- docs/src/reference/the-graph.asciidoc | 370 ------------------------- docs/src/reference/the-traversal.asciidoc | 140 ++++++++++ 2 files changed, 140 insertions(+), 370 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/62175c22/docs/src/reference/the-graph.asciidoc ---------------------------------------------------------------------- diff --git a/docs/src/reference/the-graph.asciidoc b/docs/src/reference/the-graph.asciidoc index 1bcc96f..e9305b2 100644 --- a/docs/src/reference/the-graph.asciidoc +++ b/docs/src/reference/the-graph.asciidoc @@ -345,376 +345,6 @@ In the above case, the call to `graph.tx().createThreadedTx()` creates a new `Gr `ThreadLocal` transaction, thus allowing each thread to operate on it in the same context. In this case, there would be three separate vertices persisted to the `Graph`. -== Gremlin I/O - -image:gremlin-io.png[width=250,float=right] The task of getting data in and out of `Graph` instances is the job of -the Gremlin I/O packages. Gremlin I/O provides two interfaces for reading and writing `Graph` instances: `GraphReader` -and `GraphWriter`. These interfaces expose methods that support: - -* Reading and writing an entire `Graph` -* Reading and writing a `Traversal<Vertex>` as adjacency list format -* Reading and writing a single `Vertex` (with and without associated `Edge` objects) -* Reading and writing a single `Edge` -* Reading and writing a single `VertexProperty` -* Reading and writing a single `Property` -* Reading and writing an arbitrary `Object` - -In all cases, these methods operate in the currency of `InputStream` and `OutputStream` objects, allowing graphs and -their related elements to be written to and read from files, byte arrays, etc. The `Graph` interface offers the `io` -method, which provides access to "reader/writer builder" objects that are pre-configured with serializers provided by -the `Graph`, as well as helper methods for the various I/O capabilities. Unless there are very advanced requirements -for the serialization process, it is always best to utilize the methods on the `Io` interface to construct -`GraphReader` and `GraphWriter` instances, as the implementation may provide some custom settings that would otherwise -have to be configured manually by the user to do the serialization. - -It is up to the implementations of the `GraphReader` and `GraphWriter` interfaces to choose the methods they -implement and the manner in which they work together. The only characteristic enforced and expected is that the write -methods should produce output that is compatible with the corresponding read method. For example, the output of -`writeVertices` should be readable as input to `readVertices` and the output of `writeProperty` should be readable as -input to `readProperty`. - -NOTE: Additional documentation for TinkerPop IO formats can be found in the link:http://tinkerpop.apache.org/docs/x.y.z/dev/io/[IO Reference]. - -=== GraphML Reader/Writer - -image:gremlin-graphml.png[width=350,float=left] The link:http://graphml.graphdrawing.org/[GraphML] file format is a -common XML-based representation of a graph. It is widely supported by graph-related tools and libraries making it a -solid interchange format for TinkerPop. In other words, if the intent is to work with graph data in conjunction with -applications outside of TinkerPop, GraphML may be the best choice to do that. Common use cases might be: - -* Generate a graph using link:https://networkx.github.io/[NetworkX], export it with GraphML and import it to TinkerPop. -* Produce a subgraph and export it to GraphML to be consumed by and visualized in link:https://gephi.org/[Gephi]. -* Migrate the data of an entire graph to a different graph database not supported by TinkerPop. - -As GraphML is a specification for the serialization of an entire graph and not the individual elements of a graph, -methods that support input and output of single vertices, edges, etc. are not supported. - -WARNING: GraphML is a "lossy" format in that it only supports primitive values for properties and does not have -support for `Graph` variables. It will use `toString` to serialize property values outside of those primitives. - -WARNING: GraphML as a specification allows for `<edge>` and `<node>` elements to appear in any order. Most software -that writes GraphML (including as TinkerPop's `GraphMLWriter`) write `<node>` elements before `<edge>` elements. However it -is important to note that `GraphMLReader` will read this data in order and order can matter. This is because TinkerPop -does not allow the vertex label to be changed after the vertex has been created. Therefore, if an `<edge>` element -comes before the `<node>`, the label on the vertex will be ignored. It is thus better to order `<node>` elements in the -GraphML to appear before all `<edge>` elements if vertex labels are important to the graph. - -The following code shows how to write a `Graph` instance to file called `tinkerpop-modern.xml` and then how to read -that file back into a different instance: - -[source,java] ----- -Graph graph = TinkerFactory.createModern(); -graph.io(IoCore.graphml()).writeGraph("tinkerpop-modern.xml"); -Graph newGraph = TinkerGraph.open(); -newGraph.io(IoCore.graphml()).readGraph("tinkerpop-modern.xml"); ----- - -If a custom configuration is required, then have the `Graph` generate a `GraphReader` or `GraphWriter` "builder" instance: - -[source,java] ----- -Graph graph = TinkerFactory.createModern(); -try (OutputStream os = new FileOutputStream("tinkerpop-modern.xml")) { - graph.io(IoCore.graphml()).writer().normalize(true).create().writeGraph(os, graph); -} - -Graph newGraph = TinkerGraph.open(); -try (InputStream stream = new FileInputStream("tinkerpop-modern.xml")) { - newGraph.io(IoCore.graphml()).reader().create().readGraph(stream, newGraph); -} ----- - -NOTE: If using GraphML generated from TinkerPop 2.x, you can read more about its incompatibilities in the -link:http://tinkerpop.apache.org/docs/x.y.z/upgrade/#graphml-format[Upgrade Documentation]. - -[[graphson-reader-writer]] -=== GraphSON Reader/Writer - -image:gremlin-graphson.png[width=350,float=left] GraphSON is a link:http://json.org/[JSON]-based format extended -from earlier versions of TinkerPop. It is important to note that TinkerPop's GraphSON is not backwards compatible -with prior TinkerPop GraphSON versions. GraphSON has some support from graph-related application outside of TinkerPop, -but it is generally best used in two cases: - -* A text format of the graph or its elements is desired (e.g. debugging, usage in source control, etc.) -* The graph or its elements need to be consumed by code that is not JVM-based (e.g. JavaScript, Python, .NET, etc.) - -GraphSON supports all of the `GraphReader` and `GraphWriter` interface methods and can therefore read or write an -entire `Graph`, vertices, arbitrary objects, etc. The following code shows how to write a `Graph` instance to file -called `tinkerpop-modern.json` and then how to read that file back into a different instance: - -[source,java] ----- -Graph graph = TinkerFactory.createModern(); -graph.io(graphson()).writeGraph("tinkerpop-modern.json"); - -Graph newGraph = TinkerGraph.open(); -newGraph.io(graphson()).readGraph("tinkerpop-modern.json"); ----- - -NOTE: Using `graphson()`, which is a static helper method of `IoCore`, will default to the most current version of GraphSON which is 3.0. - -If a custom configuration is required, then have the `Graph` generate a `GraphReader` or `GraphWriter` "builder" instance: - -[source,java] ----- -Graph graph = TinkerFactory.createModern(); -try (OutputStream os = new FileOutputStream("tinkerpop-modern.json")) { - GraphSONMapper mapper = graph.io(IoCore.graphson()).mapper().normalize(true).create() - graph.io(graphson()).writer().mapper(mapper).create().writeGraph(os, graph) -} - -Graph newGraph = TinkerGraph.open(); -try (InputStream stream = new FileInputStream("tinkerpop-modern.json")) { - newGraph.io(graphson()).reader().create().readGraph(stream, newGraph); -} ----- - -The following example shows how a single `Vertex` is written to GraphSON using the Gremlin Console: - -[gremlin-groovy] ----- -graph = TinkerFactory.createModern() -g = graph.traversal() -f = new ByteArrayOutputStream() -graph.io(graphson()).writer().create().writeVertex(f, g.V(1).next(), BOTH) -f.close() ----- - -The following GraphSON example shows the output of `GraphSONWriter.writeVertex()` with associated edges: - -[source,json] ----- -{ - "id": { - "@type": "g:Int32", - "@value": 1 - }, - "label": "person", - "outE": { - "created": [{ - "id": { - "@type": "g:Int32", - "@value": 9 - }, - "inV": { - "@type": "g:Int32", - "@value": 3 - }, - "properties": { - "weight": { - "@type": "g:Double", - "@value": 0.4 - } - } - }], - "knows": [{ - "id": { - "@type": "g:Int32", - "@value": 7 - }, - "inV": { - "@type": "g:Int32", - "@value": 2 - }, - "properties": { - "weight": { - "@type": "g:Double", - "@value": 0.5 - } - } - }, { - "id": { - "@type": "g:Int32", - "@value": 8 - }, - "inV": { - "@type": "g:Int32", - "@value": 4 - }, - "properties": { - "weight": { - "@type": "g:Double", - "@value": 1.0 - } - } - }] - }, - "properties": { - "name": [{ - "id": { - "@type": "g:Int64", - "@value": 0 - }, - "value": "marko" - }], - "age": [{ - "id": { - "@type": "g:Int64", - "@value": 1 - }, - "value": { - "@type": "g:Int32", - "@value": 29 - } - }] - } -} ----- - -GraphSON has several versions and each has differences that prevent complete compatibility with one another. While the -default version provided by `IoCore.graphson()` is recommended, it is possible to make changes to revert to an earlier -version. The following shows an example of how to use 1.0 (with type embedding): - -[gremlin-groovy] ----- -graph = TinkerFactory.createModern() -g = graph.traversal() -f = new ByteArrayOutputStream() -mapper = graph.io(GraphSONIo.build(GraphSONVersion.V1_0)).mapper().typeInfo(TypeInfo.PARTIAL_TYPES).create() -graph.io(GraphSONIo.build(GraphSONVersion.V1_0)).writer().mapper(mapper).create().writeVertex(f, g.V(1).next(), BOTH) -f.close() ----- - -NOTE: Additional documentation for GraphSON can be found in the link:http://tinkerpop.apache.org/docs/x.y.z/dev/io/#graphson[IO Reference]. - -IMPORTANT: When using the extended type system in Gremlin Server, support for these types when used in the context of -Gremlin Language Variants is dependent on the programming language, the driver and its serializers. These -implementations are only required to support the core types and not the extended ones. - -Here's the same previous example of GraphSON 1.0, but with GraphSON 2.0: - -[gremlin-groovy] ----- -graph = TinkerFactory.createModern() -g = graph.traversal() -f = new ByteArrayOutputStream() -mapper = graph.io(graphson()).mapper().version(GraphSONVersion.V2_0).create() -graph.io(graphson()).writer().mapper(mapper).create().writeVertex(f, g.V(1).next(), BOTH) -f.close() ----- - -Creating a GraphSON 2.0 mapper is done by calling `.version(GraphSONVersion.V2_0)` on the mapper builder. Here's is the -example output from the code above: - -[source,json] ----- -{ - "@type": "g:Vertex", - "@value": { - "id": { - "@type": "g:Int32", - "@value": 1 - }, - "label": "person", - "properties": { - "name": [{ - "@type": "g:VertexProperty", - "@value": { - "id": { - "@type": "g:Int64", - "@value": 0 - }, - "value": "marko", - "label": "name" - } - }], - "uuid": [{ - "@type": "g:VertexProperty", - "@value": { - "id": { - "@type": "g:Int64", - "@value": 12 - }, - "value": { - "@type": "g:UUID", - "@value": "829c7ddb-3831-4687-a872-e25201230cd3" - }, - "label": "uuid" - } - }], - "age": [{ - "@type": "g:VertexProperty", - "@value": { - "id": { - "@type": "g:Int64", - "@value": 1 - }, - "value": { - "@type": "g:Int32", - "@value": 29 - }, - "label": "age" - } - }] - } - } -} ----- - -Types can be disabled when creating a GraphSON 2.0 `Mapper` with: - -[source,groovy] ----- -graph.io(graphson()).mapper(). - version(GraphSONVersion.V2_0). - typeInfo(GraphSONMapper.TypeInfo.NO_TYPES).create() ----- - -By disabling types, the JSON payload produced will lack the extra information that is written for types. Please note, -disabling types can be unsafe with regards to the written data in that types can be lost. - -[[gryo-reader-writer]] -=== Gryo Reader/Writer - -image:gremlin-kryo.png[width=400,float=left] link:https://github.com/EsotericSoftware/kryo[Kryo] is a popular -serialization package for the JVM. Gremlin-Kryo is a binary `Graph` serialization format for use on the JVM by JVM -languages. It is designed to be space efficient, non-lossy and is promoted as the standard format to use when working -with graph data inside of the TinkerPop stack. A list of common use cases is presented below: - -* Migration from one Gremlin Structure implementation to another (e.g. `TinkerGraph` to `Neo4jGraph`) -* Serialization of individual graph elements to be sent over the network to another JVM. -* Backups of in-memory graphs or subgraphs. - -WARNING: When migrating between Gremlin Structure implementations, Kryo may not lose data, but it is important to -consider the features of each `Graph` and whether or not the data types supported in one will be supported in the -other. Failure to do so, may result in errors. - -Kryo supports all of the `GraphReader` and `GraphWriter` interface methods and can therefore read or write an entire -`Graph`, vertices, edges, etc. The following code shows how to write a `Graph` instance to file called -`tinkerpop-modern.kryo` and then how to read that file back into a different instance: - -[source,java] ----- -Graph graph = TinkerFactory.createModern(); -graph.io(gryo()).writeGraph("tinkerpop-modern.kryo"); - -Graph newGraph = TinkerGraph.open(); -newGraph.io(gryo()).readGraph("tinkerpop-modern.kryo"); ----- - -NOTE: Using `gryo()`, which is a static helper method of `IoCore`, will default to the most current version of Gryo which is 3.0. - -If a custom configuration is required, then have the `Graph` generate a `GraphReader` or `GraphWriter` "builder" instance: - -[source,java] ----- -Graph graph = TinkerFactory.createModern(); -try (OutputStream os = new FileOutputStream("tinkerpop-modern.kryo")) { - graph.io(GryoIo.build(GryoVersion.V1_0)).writer().create().writeGraph(os, graph); -} - -Graph newGraph = TinkerGraph.open(); -try (InputStream stream = new FileInputStream("tinkerpop-modern.kryo")) { - newGraph.io(GryoIo.build(GryoVersion.V1_0)).reader().create().readGraph(stream, newGraph); -} ----- - -NOTE: The preferred extension for files names produced by Gryo is `.kryo`. - -NOTE: Data migrations from TinkerPop 2.x are discussed in the Appendix of the -link:http://tinkerpop.apache.org/docs/x.y.z/upgrade/#appendix[Upgrade Documentation]. - == Namespace Conventions End users, <<implementations,graph system providers>>, <<graphcomputer,`GraphComputer`>> algorithm designers, http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/62175c22/docs/src/reference/the-traversal.asciidoc ---------------------------------------------------------------------- diff --git a/docs/src/reference/the-traversal.asciidoc b/docs/src/reference/the-traversal.asciidoc index 1fb8abd..34f6b27 100644 --- a/docs/src/reference/the-traversal.asciidoc +++ b/docs/src/reference/the-traversal.asciidoc @@ -1049,6 +1049,146 @@ inject(1,2).map {it.get() + 1}.map {g.V(it.get()).next()}.values('name') link:++http://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#inject-E...-++[`inject(Object)`] +[[_gremlin_i_o]] +[[io-step]] +=== IO Step + +image:gremlin-io.png[width=250,float=left] The task of importing and exporting the data of `Graph` instances is the +job of the `io()`-step. By default, TinkerPop supports three formats for importing and exporting graph data in +<<graphml,GraphML>>, <<graphson,GraphSON>>, and <<gryo,Gryo>>. + +NOTE: Additional documentation for TinkerPop IO formats can be found in the link:http://tinkerpop.apache.org/docs/x.y.z/dev/io/[IO Reference]. + +By itself the `io()` step merely configures the kind of importing and exporting that is going +to occur and it is the follow-on call to the `read()` or `write()` step that determines which of those actions will +execute. Therefore, a typical usage of the `io()` step would look like this: + +[source,java] +---- +g.io(someInputFile).read() +g.io(someOutputFile).write() +---- + +By default, the `io()` step will try to detect the right file format using the file name extension. To gain greater +control of the format use the `with()` step modulator to provide further information to `io()`. For example: + +[source,java] +---- +g.io(someInputFile). + with(IO.reader, IO.graphson). + read() +g.io(someOutputFile). + with(IO.writer,IO.graphml). + write() +---- + +The `IO` class is a helper that for the `io()` step that provides expressions that can be used to help configure it +and in this case it allows direct specification of the "reader" or "writer" to use. The "reader" actually refers to +a `GraphReader` implementation and the `writer" refers to a `GraphWriter` implementation. The implementations of +those interfaces provided by default are the standard TinkerPop implementations. + +That default is an important point to consider for users. The default TinkerPop implementations are not designed with +massive, complex, parallel bulk loading in mind. They are designed to do single-threaded, OLTP-style loading of data +in the most generic way possible so as to accommodate the greatest number of graph databases out there. As such, from +a reading perspective, they work best for small datasets (or perhaps medium datasets where memory is plentiful and +time is not critical) that are loading to an empty graph - incremental loading is not supported. The story from the +writing perspective is not that different in there are no parallel operations in play, however streaming the output +to disk requires a single pass of the data without high memory requirements for larger datasets. + +In general, TinkerPop recommends that users examine the native bulk import/export tools of the graph implementation +that they choose. Those tools will often outperform the `io()` step and perhaps be easier to use with a greater +feature set. That said, graph providers do have the option to optimize `io()` to back it with their own +import/export utilities and therefore the default behavior provided by TinkerPop described above might be overridden +by the graph. + +An excellent example of this lies in <<hadoop-gremlin,HadoopGraph>> with <<sparkgraphcomputer,SparkGraphComputer>> +which replaces the default single-threaded implementation with a more advanced OLAP style bulk import/export +functionality internally using <<clonevertexprogram,CloneVertexProgram>>. With this model, graphs of arbitrary size +can be imported/exported assuming that there is a Hadoop `InputFormat` or `OutputFormat` to support it. + +IMPORTANT: Remote Gremlin Console users or Gremlin Language Variant (GLV) users (e.g. gremlin-python) who utilize +the `io()` step should recall that their `read()` or `write()` operation will occur on the server and not locally +and therefore the file specified for import/export must be something accessible by the server. + +[[_graphml_reader_writer]] +[[graphml]] +==== GraphML + +image:gremlin-graphml.png[width=350,float=left] The link:http://graphml.graphdrawing.org/[GraphML] file format is a +common XML-based representation of a graph. It is widely supported by graph-related tools and libraries making it a +solid interchange format for TinkerPop. In other words, if the intent is to work with graph data in conjunction with +applications outside of TinkerPop, GraphML may be the best choice to do that. Common use cases might be: + +* Generate a graph using link:https://networkx.github.io/[NetworkX], export it with GraphML and import it to TinkerPop. +* Produce a subgraph and export it to GraphML to be consumed by and visualized in link:https://gephi.org/[Gephi]. +* Migrate the data of an entire graph to a different graph database not supported by TinkerPop. + +WARNING: GraphML is a "lossy" format in that it only supports primitive values for properties and does not have +support for `Graph` variables. It will use `toString` to serialize property values outside of those primitives. + +WARNING: GraphML as a specification allows for `<edge>` and `<node>` elements to appear in any order. Most software +that writes GraphML (including as TinkerPop's `GraphMLWriter`) write `<node>` elements before `<edge>` elements. However it +is important to note that `GraphMLReader` will read this data in order and order can matter. This is because TinkerPop +does not allow the vertex label to be changed after the vertex has been created. Therefore, if an `<edge>` element +comes before the `<node>`, the label on the vertex will be ignored. It is thus better to order `<node>` elements in the +GraphML to appear before all `<edge>` elements if vertex labels are important to the graph. + +[source,java] +---- +g.io("graph.xml").read() +g.io("graph.xml").write() +---- + +NOTE: If using GraphML generated from TinkerPop 2.x, read more about its incompatibilities in the +link:http://tinkerpop.apache.org/docs/x.y.z/upgrade/#graphml-format[Upgrade Documentation]. + +[[graphson-reader-writer]] +[[graphson]] +==== GraphSON + +image:gremlin-graphson.png[width=350,float=left] GraphSON is a link:http://json.org/[JSON]-based format extended +from earlier versions of TinkerPop. It is important to note that TinkerPop's GraphSON is not backwards compatible +with prior TinkerPop GraphSON versions. GraphSON has some support from graph-related application outside of TinkerPop, +but it is generally best used in two cases: + +* A text format of the graph or its elements is desired (e.g. debugging, usage in source control, etc.) +* The graph or its elements need to be consumed by code that is not JVM-based (e.g. JavaScript, Python, .NET, etc.) + +[source,java] +---- +g.io("graph.json").read() +g.io("graph.json").write() +---- + +NOTE: Additional documentation for GraphSON can be found in the link:http://tinkerpop.apache.org/docs/x.y.z/dev/io/#graphson[IO Reference]. + +[[gryo-reader-writer]] +[[gryo]] +==== Gryo + +image:gremlin-kryo.png[width=400,float=left] link:https://github.com/EsotericSoftware/kryo[Kryo] is a popular +serialization package for the JVM. Gremlin-Kryo is a binary `Graph` serialization format for use on the JVM by JVM +languages. It is designed to be space efficient, non-lossy and is promoted as the standard format to use when working +with graph data inside of the TinkerPop stack. A list of common use cases is presented below: + +* Migration from one Gremlin Structure implementation to another (e.g. `TinkerGraph` to `Neo4jGraph`) +* Serialization of individual graph elements to be sent over the network to another JVM. +* Backups of in-memory graphs or subgraphs. + +WARNING: When migrating between Gremlin Structure implementations, Kryo may not lose data, but it is important to +consider the features of each `Graph` and whether or not the data types supported in one will be supported in the +other. Failure to do so, may result in errors. + +[source,java] +---- +g.io("graph.kryo").read() +g.io("graph.kryo").write() +---- + +*Additional References* + +link:++http://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversalSource.html#io-String...-++[`io(String)`] + [[is-step]] === Is Step