Re: [DISCUSS] Graph Schema Interfaces for TP

Joshua Shinavier Thu, 02 Apr 2026 20:31:36 -0700

Hi Cole. This looks good to me. With respect to the schema types, would you
please review the property graph model I showed a couple of meetings ago --
here
<https://categoricaldata.net/hydra/hydra-java/javadoc/hydra/pg/model/package-summary.html>
--
and let me know if you have any feedback. The types of interest start with
GraphSchema
<https://categoricaldata.net/hydra/hydra-java/javadoc/hydra/pg/model/GraphSchema.html>.
Ignore Hydra-specific details like PersistentMap/ConsList/Name, etc. If the
structure is agreeable, I will map the types into a format suitable for use
in your PR, in an org.apache.tinkerpop namespace. There is an associated
JSON format for interchange of GraphSchema and any other type we define in
this way.


Josh



On Thu, Apr 2, 2026 at 7:17 PM Cole Greer via dev <[email protected]>
wrote:

> Hi Everyone,
>
> The topic of Graph Schema has been discussed extensively in recent
> TInkerPop Gatherings, and the following proposal has emerged from these
> gatherings. I believe it is now ready for broad consideration and
> discussions. I’ve done my best to incorporate initial feedback from Josh,
> Pieter, Valentyn, Stephen, Kris and others into this proposal, however I
> won’t claim that it accurately represents the views of anyone other than
> myself at this time. This is a broad topic and I’m deliberately excluding
> critical topics to focus this thread on standardizing interfaces for
> gremlin users and providers to interact with schema (see assumptions for
> more details).
>
> ## Overview
>
> This proposal introduces graph schema interfaces for TinkerPop: a way to
> define vertex types, edge types, and property types as a meta-graph that is
> itself traversable with Gremlin. The schema describes the structure of a
> data graph; what kinds of vertices and edges exist, what properties they
> carry, and how they connect..
>
> ## Assumptions
>
> - Type keys are element labels: there is a 1-to-1 mapping between a label
> and a type definition. A vertex labeled "person" corresponds to exactly one
> VertexType, and an edge labeled "knows" corresponds to exactly one EdgeType.
> - Java classes are used as a type system: This proposal uses Java classes
> to define property type constraints. This is intended as a placeholder to
> be replaced by a proper type system to be defined via a later discussion.
> - This proposal makes very little consideration of if/when/where/how
> validation and enforcement of schema takes place. I believe it is important
> for us to ship something which is flexible and useful to providers out of
> the box as well as leaving space for providers to plugin existing
> implementations or build their own if they desire. I’ve left this out of
> scope for this proposal to focus first on interfaces which give providers
> the appropriate access to schema.
>
> ## Design Points
>
> ### 1. Schema-as-Graph
>
> `GraphSchema extends Graph`. Providers implement a familiar interface, and
> users traverse the schema with schema.traversal(). This avoids inventing a
> parallel API surface. The schema is just another graph.
>
> A data graph exposes its schema via Graph.schema(), which returns the
> GraphSchema instance. Providers that don't support schema return
> UnsupportedOperationException by default.
>
> ### 2. All type definitions are vertices
>
> VertexType, EdgeType, and PropertyType are all vertices in the schema
> meta-graph.
>
> - A VertexType vertex represents a vertex label definition (e.g. "person",
> "software").
> - An EdgeType vertex represents an edge label definition (e.g. "knows",
> "created"). Even though it describes edges in the data graph, it is itself
> a vertex in the schema graph, connected to its endpoint VertexType vertices
> via from/to edges.
> - A PropertyType vertex represents a property on a type, connected to its
> parent type vertex via a “hasProperty" edge.
>
> Property definitions are independent per type, no sharing across types.
>
> Schema graph example for the classic TinkerPop modern graph:
> ```
> (person:vertexType) --hasProperty--> (name:propertyType)
> (person:vertexType) --hasProperty--> (age:propertyType)
> (software:vertexType) --hasProperty--> (name:propertyType)
> (software:vertexType) --hasProperty--> (lang:propertyType)
> (knows:edgeType) --from--> (person:vertexType)
> (knows:edgeType) --to-->   (person:vertexType)
> (knows:edgeType) --hasProperty--> (weight:propertyType)
> (created:edgeType) --from--> (person:vertexType)
> (created:edgeType) --to-->   (software:vertexType)
> (created:edgeType) --hasProperty--> (weight:propertyType)
> ```
>
> ### 3. Constraints are properties on type vertices
>
> Rather than a fixed constraint taxonomy, constraints are regular
> properties on type vertices, keyed by string via constraint(key, value).
> This keeps the model extensible such that providers can define their own
> constraints without changes to the core API.
>
> Constraints can be added to VertexType, EdgeType, and PropertyType
> vertices directly. The most common constraints such as property types and
> required properties would apply to PropertyTypes, while edge multiplicity
> constraints (e.g. one-to-many, one-to-one) are naturally expressed as
> constraints on the EdgeType itself rather than on any property.
>
> While constraint keys are arbitrary strings and providers are free to
> implement any constraints they like, TinkerPop should standardize a set of
> core constraint keys representing the most common constraints. Examples
> include “type", “required", “unique", “minValue", “maxValue", etc.
> Providers that support equivalent constraints are encouraged to follow
> these conventional names for interoperability.
>
> Non-core constraints (custom to a provider) are encouraged to follow a
> namespaced key convention to avoid collisions, e.g. "tinkergraph:notNull".
> Core constraint keys are unnamespaced.
>
> ### 4. Schema traversal steps in core Gremlin
>
> New steps for schema manipulation live directly in
> GraphTraversal/GraphTraversalSource, not in a separate DSL:
>
> - addVType(label) — creates a VertexType vertex
> - addEType(label) — creates an EdgeType vertex
> - propertyType(name) — creates a PropertyType vertex and connects it via
> hasProperty
> - constraint(key, value) — adds a constraint property to the current type
> vertex
>
> Example: defining a vertex type with properties:
> ```
> schema.traversal().addVType("person")
>     .propertyType("name").constraint("type",
> String.class).constraint("required", true).constraint("unique", true)
>     .propertyType("age").constraint("type", Integer.class)
> ```
>
> Example: defining an edge type with endpoint types and a property:
> ```
> schema.traversal().addEType("knows")
>     .from("person").to("person")
>     .propertyType("weight").constraint("type", Double.class)
> ```
>
> This mirrors the addE().from().to() pattern from the data-graph. Here
> from() and to() take vertex type labels (strings) and create from/to edges
> in the schema graph connecting the EdgeType to the referenced VertexType
> vertices.
>
> ### 5. Convenience methods for direct access
>
> The schema-as-graph model is the source of truth, but traversing it for
> simple lookups isn’t always convenient. Direct methods provide compact
> access:
>
> GraphSchema methods:
> - vertexTypes() → Collection<VertexType>
> - vertexType(String label) → Optional<VertexType>
> - edgeTypes() → Collection<EdgeType>
> - edgeType(String label) → Optional<EdgeType>
> - addVertexType(String label) → VertexType
> - addEdgeType(String label) → EdgeType
> - store(OutputStream):  serialize the schema to a compact JSON
> representation
> - load(InputStream): deserialize and merge a schema from JSON into this
> schema graph
>
> EdgeType methods:
> - fromVertexTypes() → Collection<VertexType>
> - toVertexTypes() → Collection<VertexType>
>
> Example:
> ```
> GraphSchema schema = graph.schema();
>
> // Look up a vertex type
> VertexType person = schema.vertexType("person").orElseThrow();
>
> // Inspect its properties
> for (PropertyType pd : person.propertyTypes()) {
>     System.out.println(pd.name() + " : " + pd.constraint("type"));
> }
>
> // Look up an edge type and its connectivity
> EdgeType knows = schema.edgeType("knows").orElseThrow();
> Collection<VertexType> fromTypes = knows.fromVertexTypes();
> Collection<VertexType> toTypes = knows.toVertexTypes();
> ```
>
> ### 6. Cross-graph jumps
>
> Two steps bridge the data graph and schema graph:
>
> - type(): from a data traversal, jump to the element's type definition in
> the schema graph.
> - instances(): from a schema traversal, jump to all matching elements in
> the data graph.
>
> These compose for round-trip traversals:
> ```
> // Get the type definition for "person" vertices
> g.V().hasLabel("person").type()
>
> // Get all instances of a schema type
> schema.traversal().vertexType("person").instances()
>
> // Round-trip: find marko's type, then get all instances of that type
> g.V().has("person", "name", "marko").type().instances()
> ```
>
> ### 7. Schema restriction strategy
>
> There are some steps we will want to restrict in both the data graph and
> the schema-graph. addVType() wouldn’t make sense in the data-graph, nor
> would addV() be sensible in the schema-graph. A TraversalStrategy can
> restrict schema traversals to a safe subset of Gremlin steps
> (allowlist-based). This prevents accidentally running data element
> insertions, OLAP computations, complex control flow, or side-effect steps
> against the schema graph. The strategy should be auto-registered when
> traversing a GraphSchema instance.
>
> The exact allowlist should be a topic for later discussion.
>
> ### 8. Instance counts on type vertices
>
> VertexType.instanceCount() and EdgeType.instanceCount() return the count
> of data graph elements matching each type. This is a method rather than a
> property on the type vertex, keeping the schema graph definitional (not
> statistical) and giving providers full implementation flexibility.
>
> Approximate counts are likely acceptable and preferable for performance in
> most cases. However, TinkerPop should not stand in the way of providers
> that prefer exact counts, and should ensure that appropriate hooks are in
> place in reference implementations so that providers can maintain exact
> counts if they so desire.
>
> Transactional implications need additional consideration. Maintaining
> accurate counts across concurrent writes, rollbacks, and transaction
> isolation levels adds significant complexity. This interacts with the
> broader schema transactions question (see transactions below) and should be
> addressed alongside it.
>
> ### 9. GLV Support
>
> Each GLV (Python, JavaScript, .NET, Go) needs:
>
> - Schema data classes: Parallel classes to the 4 core Java interfaces,
> following the same pattern as existing Vertex and Edge classes. These are
> data containers representing schema objects returned from the server:
>   - GraphSchema: holds collections of VertexTypes and EdgeTypes
>   - VertexType: label, full constraints map, and collection of
> PropertyTypes
>   - EdgeType: label, full constraints map, from/to VertexType references
> (same pattern as Edge.outV/Edge.inV), and collection of PropertyTypes
>   - PropertyType: name and full constraints map (including data type as a
> constraint)
> - All new gremlin steps are supported from each GLV
>
> ## Future Questions
>
> ### Schema validation
>
> Providers will need lots of flexibility regarding validation modes. Some
> providers may choose to have write-time validation for all inserts, others
> may choose validate an entire graph against a schema as a batch job, while
> others may choose to validate on-commit. For our purposes, we need to
> provide a viable reference implementation, as well as ensuring sufficient
> extension points exist for providers to fulfill their needs.
>
> ### Dynamic schema updates from data writes
>
> It would be useful to auto-update the schema graph when data writes
> introduce new labels or properties (e.g. addV("newLabel”) automatically
> creates a VertexType). Keeping the schema exactly in-sync with such
> operations may introduce too much overhead for many purposes. We should
> provide appropriate hooks for providers to implement such behaviour if
> desired, or to help providers aggregate changes and perform incremental
> batch updates to the schema.
>
> ### Transactions
>
> The schema graph will need to be transactional if the data
>
> ### File IO
>
> It is often useful to persist and load schemas to/from files. This
> capability should be build into the GraphSchema class via simple store()
> and load() methods, using a custom compact JSON representation of the
> schema. The specifics of this format are deferred to later discussion.
>
> GraphSchema exposes file IO directly:
> - store(OutputStream): serialize the schema to a compact JSON
> representation
> - load(InputStream): deserialize a schema from JSON and merge it into the
> current schema graph
>
> Schema file IO should be implemented across all GLVs.
>
> ## Reference Implementation
>
> TinkerGraph serves as the reference implementation:
>
> - TinkerGraphSchema extends TinkerGraph implements GraphSchema
> - TinkerVertexType extends TinkerVertex implements VertexType
> - TinkerPropertyType extends TinkerVertex implements PropertyType
> - TinkerEdgeType extends TinkerVertex implements EdgeType
> - Recursion guard prevents schema-of-schema (TinkerGraphSchema overrides
> initSchema())
>
>
> Please let me know any thoughts you may have on the approach. I intend to
> move this into a proposal PR soon, unless there are any major disagreements
> over the design.
>
> Thanks,
> Cole
>

Re: [DISCUSS] Graph Schema Interfaces for TP

Reply via email to