spmallette commented on code in PR #3238:
URL: https://github.com/apache/tinkerpop/pull/3238#discussion_r2436797074
##########
docs/src/reference/gremlin-applications.asciidoc:
##########
@@ -3009,3 +3011,128 @@ for `HadoopGraph`:
----
describeGraph(HadoopGraph)
----
+
+[[gremlin-mcp]]
+=== Gremlin MCP
+
+Gremlin MCP integrates Apache TinkerPop with the Model Context Protocol (MCP)
so that MCP‑capable assistants (for
+example, desktop chat clients that support MCP) can discover your graph, run
Gremlin traversals and exchange graph data
+through a small set of well‑defined tools. It allows users to “talk to your
graph” while keeping full Gremlin power
+available when they or the assistant need it.
+
+MCP is an open protocol that lets assistants call server‑hosted tools in a
structured way. Each tool has a name, an
+input schema, and a result schema. When connected to a Gremlin MCP server, the
assistant can:
+
+* Inspect the server’s health and connection to a Gremlin data source
+* Discover the graph’s schema (labels, properties, relationships, counts)
+* Execute Gremlin traversals
+* Export graph data in common formats
+
+The Gremlin MCP server sits alongside Gremlin Server (or any
TinkerPop‑compatible endpoint) and forwards tool calls to
+the graph via standard Gremlin traversals.
+
+IMPORTANT: This MCP server is designed for development and trusted
environments.
+
+WARNING: Gremlin MCP can modify the graph to which it is connected. To prevent
such changes, ensure that Gremlin MCP is
+configured to work against a read-only instance of the graph. Gremlin Server
hosted graphs can configure their graph
+using `withStrategies(ReadOnlyStrategy)` for that protection.
+
+WARNING: Gremlin MCP executes global graph traversal to help it understand the
schema and gather statistics. On a large
+graph these queries will be costly. If you are trying Gremlin MCP, please try
it with a smaller subset of your graph for
+experimentation purposes.
+
+MCP defines a simple request/response model for invoking named tools. A tool
declares its input and output schema so an
+assistant can construct valid calls and reason about results. The Gremlin MCP
server implements several tools and, when
+invoked by an MCP client, translates those calls to Gremlin traversals against
a configured Gremlin endpoint. The
+endpoint is typically Gremlin Server, but could be used with any graph system
that implements its protocols.
+
+TIP: Gremlin MCP does not replace Gremlin itself. It complements it by helping
assistants discover data and propose
+traversals. You can always provide an explicit traversal when you know what
you want.
+
+The Gremlin MCP server exposes these tools:
+
+* `get_graph_status` — Returns basic health and connectivity information for
the backing Gremlin data source.
+* `get_graph_schema` — Discovers vertex labels, edge labels, property keys,
and relationship patterns. Low‑cardinality
+ properties may be surfaced as enums to encourage valid values in queries.
+* `run_gremlin_query` — Executes an arbitrary Gremlin traversal and returns
JSON results.
+* `refresh_schema_cache` — Forces schema discovery to run again when the graph
has changed.
+* `export_subgraph` — Exports a selected subgraph to JSON, GraphSON, or CSV.
+
+WARNING: Export operations can involve large portions of the graph. Ensure
proper authorization and confirm the
+assistant’s intent in the client before approving such operations.
+
+==== Schema discovery
+
+Schema discovery is the foundation that lets humans and AI assistants reason
about a graph without prior tribal
+knowledge. By automatically mapping the graph’s structure and commonly
observed patterns, it produces a concise,
+trustworthy description that accelerates onboarding, improves the quality of
suggested traversals, and reduces
+trial‑and‑error against production data. For assistants, a discovered schema
becomes the guidance layer for planning
+valid queries, generating meaningful filters, and explaining results in
natural language. For operators, it offers safer
+and more efficient interactions by avoiding blind exploratory scans, enabling
caching and change detection, and
+providing hooks to steer what should or shouldn’t be surfaced (for example,
excluding sensitive or non‑categorical
+fields). In short, schema discovery turns an opaque dataset into an actionable
contract between your graph and the tools
+that use it.
+
+Schema discovery uses Gremlin traversals and sampling to uncover the following
information about the graph:
+
+* Labels - Vertex and edge labels are collected and de‑duplicated.
+* Properties - For each label, a sample of elements is inspected to list
observed property keys.
+* Counts (optional) - Approximate counts can be included per label.
+* Relationship patterns - Connectivity is derived from the labels of edges and
their incident vertices.
+* Enums - Properties with a small set of distinct values may be surfaced as
enumerations to promote precise filters.
+
+==== Executing traversals
+
+When the assistant needs to answer a question, a common sequence is:
+
+. Optionally, call get_graph_status.
+. Retrieve (or reuse) schema via `get_graph_schema`.
+. Formulate a traversal and call `run_gremlin_query`.
+. Present results and, if required, refine the traversal.
+
+For example, the assistant may execute a traversal like the following:
+
+[source,groovy]
+----
+// list the names of people over 30 and who they know
+g.V().hasLabel('person').has('age', gt(30)).out('knows').values('name')
+----
+
+==== Export
+
+Describe the subgraph or selection criteria (for example, "all person vertices
and their `knows` edges") and choose a
+target format. The server returns the exported data for download by the client.
+
+==== Configuring an MCP Client
+
+The MCP client is responsible for launching the Gremlin MCP server and
providing connection details for the Gremlin
+endpoint the server should use.
+
+Basic connection settings:
+
+* `GREMLIN_ENDPOINT` — `host:port` or `host:port/traversal_source` for the
target Gremlin Server or compatible endpoint (default traversal source: `g`)
+* `GREMLIN_USE_SSL` — set to `true` when TLS is required by the endpoint
(default: `false`)
+* `GREMLIN_USERNAME` / `GREMLIN_PASSWORD` — credentials when authentication is
enabled (optional)
+* `GREMLIN_IDLE_TIMEOUT` — idle connection timeout in seconds (default: `300`)
+* `LOG_LEVEL` — logging verbosity for troubleshooting: `error`, `warn`,
`info`, or `debug` (default: `info`)
+
+Advanced schema discovery and performance tuning:
+
+* `GREMLIN_ENUM_DISCOVERY_ENABLED` — enable enum property discovery (default:
`true`)
+* `GREMLIN_ENUM_CARDINALITY_THRESHOLD` — max distinct values for a property to
be considered an enum (default: `10`)
+* `GREMLIN_ENUM_PROPERTY_DENYLIST` — comma-separated property names to exclude
from enum detection (default:
`id,pk,name,description,startDate,endDate,timestamp,createdAt,updatedAt`)
+* `GREMLIN_SCHEMA_MAX_ENUM_VALUES` — limit the number of enum values returned
per property in the schema (default: `10`)
+* `GREMLIN_SCHEMA_INCLUDE_SAMPLE_VALUES` — include small example values for
properties in the schema (default: `false`)
+* `GREMLIN_SCHEMA_INCLUDE_COUNTS` — include approximate vertex/edge label
counts in the schema (default: `true`)
Review Comment:
making this simple change, kicked open a wide world of issues i had to
investigate. fixed logging and configuration setup as part of making that
change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]