This is an automated email from the ASF dual-hosted git repository. spmallette pushed a commit to branch gremlin-mcp in repository https://gitbox.apache.org/repos/asf/tinkerpop.git
commit 9609b0eef63f1d47e28b2312fbd208cb7472e50c Author: Stephen Mallette <[email protected]> AuthorDate: Tue Oct 7 08:06:29 2025 -0400 Cleaned up around denylist and better documented enum --- docs/src/reference/gremlin-applications.asciidoc | 66 +++++++++++++++------- gremlin-mcp/src/main/javascript/.env.example | 2 +- gremlin-mcp/src/main/javascript/README.md | 22 ++------ gremlin-mcp/src/main/javascript/src/config.ts | 14 ++--- .../javascript/src/gremlin/models/graph-schema.ts | 2 +- .../javascript/src/gremlin/property-analyzer.ts | 8 +-- .../javascript/src/gremlin/schema-generator.ts | 2 +- .../src/main/javascript/src/gremlin/types.ts | 2 +- .../src/main/javascript/tests/config.test.ts | 12 ++-- .../javascript/tests/property-analyzer.test.ts | 8 +-- .../main/javascript/tests/schema-assembly.test.ts | 4 +- 11 files changed, 78 insertions(+), 64 deletions(-) diff --git a/docs/src/reference/gremlin-applications.asciidoc b/docs/src/reference/gremlin-applications.asciidoc index c65aed2ab2..2bcffc995c 100644 --- a/docs/src/reference/gremlin-applications.asciidoc +++ b/docs/src/reference/gremlin-applications.asciidoc @@ -2252,7 +2252,7 @@ scrubbing, it would be quite simple to do: [source,java] ---- -String lbl = "person" +String lbl = "person"; String nodeId = "mary').next();g.V().drop().iterate();g.V().has('id', 'thomas"; String query = "g.addV('" + lbl + "').property('identifier','" + nodeId + "')"; client.submit(query); @@ -2264,13 +2264,13 @@ part of the "identifier" for the vertex on insertion: [source,java] ---- -String lbl = "person" +String lbl = "person"; String nodeId = "mary').next();g.V().drop().iterate();g.V().has('id', 'thomas"; String query = "g.addV(lbl).property('identifier',nodeId)"; Map<String,Object> params = new HashMap<>(); -params.put("lbl",lbl); -params.put("nodeId",nodeId); +params.put("lbl", lbl); +params.put("nodeId", nodeId); client.submit(query, params); ---- @@ -3035,7 +3035,7 @@ IMPORTANT: Gremlin MCP is currently available for experimental use only. It is u may change. WARNING: Gremlin MCP executes global graph traversal to help it understand the schema and gather statistics. On a large -grpah these queries will be costly. If you are trying Gremlin MCP, please try it with a smaller subset of your graph for +graph these queries will be costly. If you are trying Gremlin MCP, please try it with a smaller subset of your graph for experimentation purposes. MCP defines a simple request/response model for invoking named tools. A tool declares its input and output schema so an @@ -3061,15 +3061,23 @@ assistant’s intent in the client before approving such operations. ==== Schema discovery -Schema discovery uses Gremlin traversals and sampling: +Schema discovery is the foundation that lets humans and AI assistants reason about a graph without prior tribal +knowledge. By automatically mapping the graph’s structure and commonly observed patterns, it produces a concise, +trustworthy description that accelerates onboarding, improves the quality of suggested traversals, and reduces +trial‑and‑error against production data. For assistants, a discovered schema becomes the guidance layer for planning +valid queries, generating meaningful filters, and explaining results in natural language. For operators, it offers safer +and more efficient interactions by avoiding blind exploratory scans, enabling caching and change detection, and +providing hooks to steer what should or shouldn’t be surfaced (for example, excluding sensitive or non‑categorical +fields). In short, schema discovery turns an opaque dataset into an actionable contract between your graph and the tools +that use it. -* Labels — Vertex and edge labels are collected and de‑duplicated. -* Properties — For each label, a sample of elements is inspected to list observed property keys. -* Counts (optional) — Approximate counts can be included per label. -* Relationship patterns — Connectivity is derived from the labels of edges and their incident vertices. -* Enums — Properties with a small set of distinct values may be surfaced as enumerations to promote precise filters. +Schema discovery uses Gremlin traversals and sampling to uncover the following information about the graph: -The resulting schema helps assistants construct well‑formed traversals and explain results in natural language. +* Labels - Vertex and edge labels are collected and de‑duplicated. +* Properties - For each label, a sample of elements is inspected to list observed property keys. +* Counts (optional) - Approximate counts can be included per label. +* Relationship patterns - Connectivity is derived from the labels of edges and their incident vertices. +* Enums - Properties with a small set of distinct values may be surfaced as enumerations to promote precise filters. ==== Executing traversals @@ -3095,15 +3103,35 @@ g.V().hasLabel('person').has('age', gt(30)).out('knows').values('name') * Export — Describe the subgraph or selection criteria (for example, all person vertices and their knows edges) and choose a target format. The server returns the exported data for download by the client. -==== Connecting from an MCP client +==== Configuring an MCP Client The MCP client is responsible for launching the Gremlin MCP server and providing connection details for the Gremlin -endpoint the server should use. Typical environment variables include: - -* GREMLIN_ENDPOINT — host:port for the target Gremlin Server or compatible endpoint -* GREMLIN_USERNAME / GREMLIN_PASSWORD — credentials when authentication is enabled -* GREMLIN_USE_SSL — set to true when TLS is required by the endpoint -* LOG_LEVEL — optional logging verbosity for troubleshooting +endpoint the server should use. + +Basic connection settings: + +* `GREMLIN_ENDPOINT` — `host:port` or `host:port/traversal_source` for the target Gremlin Server or compatible endpoint (default traversal source: `g`) +* `GREMLIN_USE_SSL` — set to `true` when TLS is required by the endpoint (default: `false`) +* `GREMLIN_USERNAME` / `GREMLIN_PASSWORD` — credentials when authentication is enabled (optional) +* `GREMLIN_IDLE_TIMEOUT` — idle connection timeout in seconds (default: `300`) +* `LOG_LEVEL` — logging verbosity for troubleshooting: `error`, `warn`, `info`, or `debug` (default: `info`) + +Advanced schema discovery and performance tuning: + +* `GREMLIN_ENUM_DISCOVERY_ENABLED` — enable enum property discovery (default: `true`) +* `GREMLIN_ENUM_CARDINALITY_THRESHOLD` — max distinct values for a property to be considered an enum (default: `10`) +* `GREMLIN_ENUM_PROPERTY_DENYLIST` — comma-separated property names to exclude from enum detection (default: `id,pk,name,description,startDate,endDate,timestamp,createdAt,updatedAt`) +* `GREMLIN_SCHEMA_MAX_ENUM_VALUES` — limit the number of enum values returned per property in the schema (default: `10`) +* `GREMLIN_SCHEMA_INCLUDE_SAMPLE_VALUES` — include small example values for properties in the schema (default: `false`) +* `GREMLIN_SCHEMA_INCLUDE_COUNTS` — include approximate vertex/edge label counts in the schema (default: `true`) + +The configurations related to enums begs additional explanation as to their importance. Treating only truly categorical +properties as enums prevents misleading suggestions and sensitive data exposure in assistant‑facing schemas. Without a +denylist and related controls, low‑sample snapshots can make non‑categorical fields like IDs, timestamps, or free text +appear “enum‑like,” degrading query guidance and result explanations. By explicitly excluding such keys, the schema +remains focused on meaningful categories (e.g., status or type), which improves AI query formulation, reduces noise, and +avoids surfacing unstable or private values. It also streamlines schema discovery by skipping properties that would +create large or frequently changing value sets, improving performance and stability. Consult the MCP client documentation for how environment variables are supplied and how tool calls are approved and presented to the user. diff --git a/gremlin-mcp/src/main/javascript/.env.example b/gremlin-mcp/src/main/javascript/.env.example index 6b707f70d8..3b8f430f12 100644 --- a/gremlin-mcp/src/main/javascript/.env.example +++ b/gremlin-mcp/src/main/javascript/.env.example @@ -41,7 +41,7 @@ GREMLIN_ENUM_DISCOVERY_ENABLED="true" GREMLIN_ENUM_CARDINALITY_THRESHOLD="50" # Optional: A comma-separated list of property names to exclude from enum discovery -GREMLIN_ENUM_PROPERTY_BLACKLIST="id,pk,name,description,startDate,endDate,arrival,departure,timestamp,createdAt,updatedAt" +GREMLIN_ENUM_PROPERTY_DENYLIST="id,pk,name,description,startDate,endDate,timestamp,createdAt,updatedAt" # Optional: Log level (default: info) # Options: error, warn, info, debug diff --git a/gremlin-mcp/src/main/javascript/README.md b/gremlin-mcp/src/main/javascript/README.md index d2b635bb9a..effa4691e7 100644 --- a/gremlin-mcp/src/main/javascript/README.md +++ b/gremlin-mcp/src/main/javascript/README.md @@ -239,14 +239,14 @@ GREMLIN_ENUM_DISCOVERY_ENABLED="true" # Default: true GREMLIN_ENUM_CARDINALITY_THRESHOLD="10" # Max distinct values for enum (default: 10) # Exclude specific properties -GREMLIN_ENUM_PROPERTY_BLACKLIST="id,uuid,timestamp,createdAt,updatedAt" +GREMLIN_ENUM_PROPERTY_DENYLIST="id,uuid,timestamp,createdAt,updatedAt" # Schema optimization GREMLIN_SCHEMA_MAX_ENUM_VALUES="10" # Limit enum values shown (default: 10) GREMLIN_SCHEMA_INCLUDE_SAMPLE_VALUES="false" # Reduce schema size (default: false) ``` -### 🚫 Property Blacklist +### 🚫 Property Denylist Some properties should never be treated as enums: @@ -261,10 +261,10 @@ Some properties should never be treated as enums: ```bash # Exclude specific properties by name -GREMLIN_ENUM_PROPERTY_BLACKLIST="userId,sessionId,description,notes,content" +GREMLIN_ENUM_PROPERTY_DENYLIST="userId,sessionId,description,notes,content" ``` -**Common Blacklist Patterns:** +**Common Denylist Patterns:** - `id,uuid,guid` - Unique identifiers - `timestamp,createdAt,updatedAt,lastModified` - Time fields @@ -331,18 +331,6 @@ GREMLIN_SCHEMA_INCLUDE_SAMPLE_VALUES="false" # Minimal schema size This intelligent enum discovery transforms how AI agents interact with your graph data, making queries more accurate and insights more meaningful! 🎯 -## 🗄️ Supported Databases - -Works with any Gremlin-compatible graph database: - -| Database | Status | Notes | -| ----------------------- | ------------- | -------------------------------- | -| 🟢 **Apache TinkerPop** | ✅ Tested | Local development and CI testing | -| 🟡 **Amazon Neptune** | 🔧 Compatible | Designed for, not yet tested | -| 🟡 **JanusGraph** | 🔧 Compatible | Designed for, not yet tested | -| 🟡 **Azure Cosmos DB** | 🔧 Compatible | With Gremlin API | -| 🟡 **ArcadeDB** | 🔧 Compatible | With Gremlin support | - ## ⚙️ Configuration Options ### Basic Configuration @@ -365,7 +353,7 @@ LOG_LEVEL="info" # Logging level: error, warn, info, debug # Schema and performance tuning GREMLIN_ENUM_DISCOVERY_ENABLED="true" # Enable smart enum detection (default: true) GREMLIN_ENUM_CARDINALITY_THRESHOLD="10" # Max distinct values for enum detection (default: 10) -GREMLIN_ENUM_PROPERTY_BLACKLIST="id,timestamp" # Exclude specific properties from enum detection +GREMLIN_ENUM_PROPERTY_DENYLIST="id,timestamp" # Exclude specific properties from enum detection GREMLIN_SCHEMA_INCLUDE_SAMPLE_VALUES="false" # Include sample values in schema (default: false) GREMLIN_SCHEMA_MAX_ENUM_VALUES="10" # Limit enum values shown (default: 10) GREMLIN_SCHEMA_INCLUDE_COUNTS="true" # Include vertex/edge counts in schema (default: true) diff --git a/gremlin-mcp/src/main/javascript/src/config.ts b/gremlin-mcp/src/main/javascript/src/config.ts index 3e8d5d3fcc..3d4babd735 100644 --- a/gremlin-mcp/src/main/javascript/src/config.ts +++ b/gremlin-mcp/src/main/javascript/src/config.ts @@ -167,14 +167,12 @@ const GremlinEnumCardinalityThresholdConfig = pipe( ); /** - * GREMLIN_ENUM_PROPERTY_BLACKLIST: string, default: id,pk,name,description,... + * GREMLIN_ENUM_PROPERTY_DENYLIST: string, default: id,pk,name,description,... * Comma-separated list of properties to exclude from enum detection */ -const GremlinEnumPropertyBlacklistConfig = pipe( - Config.string('GREMLIN_ENUM_PROPERTY_BLACKLIST'), - Config.withDefault( - 'id,pk,name,description,startDate,endDate,arrival,departure,timestamp,createdAt,updatedAt' - ), +const GremlinEnumPropertyDenyListConfig = pipe( + Config.string('GREMLIN_ENUM_PROPERTY_DENYLIST'), + Config.withDefault('id,pk,name,description,startDate,endDate,timestamp,createdAt,updatedAt'), Config.map(parseCommaSeparatedList) ); @@ -229,13 +227,13 @@ const GremlinConnectionConfig = pipe( /** * SchemaDiscoveryConfig: Aggregates and validates all schema discovery-related environment variables. - * Ensures enum discovery, cardinality, blacklist, sample values, max enum values, and counts are present and valid. + * Ensures enum discovery, cardinality, denylist, sample values, max enum values, and counts are present and valid. * Returns a validated config object or throws ConfigError on failure. */ const SchemaDiscoveryConfig = Config.all({ enumDiscoveryEnabled: GremlinEnumDiscoveryEnabledConfig, enumCardinalityThreshold: GremlinEnumCardinalityThresholdConfig, - enumPropertyBlacklist: GremlinEnumPropertyBlacklistConfig, + enumPropertyDenyList: GremlinEnumPropertyDenyListConfig, includeSampleValues: GremlinSchemaIncludeSampleValuesConfig, maxEnumValues: GremlinSchemaMaxEnumValuesConfig, includeCounts: GremlinSchemaIncludeCountsConfig, diff --git a/gremlin-mcp/src/main/javascript/src/gremlin/models/graph-schema.ts b/gremlin-mcp/src/main/javascript/src/gremlin/models/graph-schema.ts index 764f1f208a..95a1f9cec1 100644 --- a/gremlin-mcp/src/main/javascript/src/gremlin/models/graph-schema.ts +++ b/gremlin-mcp/src/main/javascript/src/gremlin/models/graph-schema.ts @@ -151,7 +151,7 @@ export const GremlinConfigSchema = z.object({ /** Cardinality threshold for enum discovery */ enumCardinalityThreshold: z.number().positive().optional().default(10), /** List of property names to exclude from enum discovery */ - enumPropertyBlacklist: z.array(z.string()).optional().default([]), + enumPropertyDenyList: z.array(z.string()).optional().default([]), /** Whether to include sample values in schema (for size optimization) */ includeSampleValues: z.boolean().optional().default(false), /** Maximum number of enum values to include (for size optimization) */ diff --git a/gremlin-mcp/src/main/javascript/src/gremlin/property-analyzer.ts b/gremlin-mcp/src/main/javascript/src/gremlin/property-analyzer.ts index 6a8bfd0e01..d8a0bee890 100644 --- a/gremlin-mcp/src/main/javascript/src/gremlin/property-analyzer.ts +++ b/gremlin-mcp/src/main/javascript/src/gremlin/property-analyzer.ts @@ -46,8 +46,8 @@ export const analyzePropertyFromValues = ( values: unknown[], config: SchemaConfig ): Property => { - // Skip blacklisted properties - if (config.enumPropertyBlacklist.includes(propertyKey)) { + // Skip denylisted properties + if (config.enumPropertyDenyList.includes(propertyKey)) { return { name: propertyKey, type: ['unknown'], @@ -97,8 +97,8 @@ export const analyzeSingleProperty = ( isVertex: boolean ): Effect.Effect<Property, GremlinQueryError> => Effect.gen(function* () { - // Skip blacklisted properties early - if (config.enumPropertyBlacklist.includes(propertyKey)) { + // Skip denylisted properties early + if (config.enumPropertyDenyList.includes(propertyKey)) { return { name: propertyKey, type: ['unknown'], diff --git a/gremlin-mcp/src/main/javascript/src/gremlin/schema-generator.ts b/gremlin-mcp/src/main/javascript/src/gremlin/schema-generator.ts index 8a5cd99d2b..aa97164e76 100644 --- a/gremlin-mcp/src/main/javascript/src/gremlin/schema-generator.ts +++ b/gremlin-mcp/src/main/javascript/src/gremlin/schema-generator.ts @@ -55,7 +55,7 @@ export const DEFAULT_SCHEMA_CONFIG: SchemaConfig = { maxEnumValues: 10, includeCounts: true, enumCardinalityThreshold: 10, - enumPropertyBlacklist: ['id', 'label', 'lastUpdatedByUI'], + enumPropertyDenyList: ['id', 'timestamp'], timeoutMs: DEFAULT_SCHEMA_TIMEOUT_MS, batchSize: 10, }; diff --git a/gremlin-mcp/src/main/javascript/src/gremlin/types.ts b/gremlin-mcp/src/main/javascript/src/gremlin/types.ts index b83450e495..04cf31720f 100644 --- a/gremlin-mcp/src/main/javascript/src/gremlin/types.ts +++ b/gremlin-mcp/src/main/javascript/src/gremlin/types.ts @@ -49,7 +49,7 @@ export interface SchemaConfig { maxEnumValues: number; includeCounts: boolean; enumCardinalityThreshold: number; - enumPropertyBlacklist: string[]; + enumPropertyDenyList: string[]; timeoutMs?: number; batchSize?: number; } diff --git a/gremlin-mcp/src/main/javascript/tests/config.test.ts b/gremlin-mcp/src/main/javascript/tests/config.test.ts index 9249214dc6..261630552b 100644 --- a/gremlin-mcp/src/main/javascript/tests/config.test.ts +++ b/gremlin-mcp/src/main/javascript/tests/config.test.ts @@ -48,7 +48,7 @@ describe('Effect-based Configuration Management', () => { process.env.GREMLIN_IDLE_TIMEOUT = '300'; process.env.GREMLIN_ENUM_DISCOVERY_ENABLED = 'true'; process.env.GREMLIN_ENUM_CARDINALITY_THRESHOLD = '10'; - process.env.GREMLIN_ENUM_PROPERTY_BLACKLIST = 'id,pk,name'; + process.env.GREMLIN_ENUM_PROPERTY_DENYLIST = 'id,pk,name'; process.env.GREMLIN_SCHEMA_INCLUDE_SAMPLE_VALUES = 'false'; process.env.GREMLIN_SCHEMA_MAX_ENUM_VALUES = '10'; process.env.GREMLIN_SCHEMA_INCLUDE_COUNTS = 'true'; @@ -66,7 +66,7 @@ describe('Effect-based Configuration Management', () => { schema: { enumDiscoveryEnabled: true, enumCardinalityThreshold: 10, - enumPropertyBlacklist: ['id', 'pk', 'name'], + enumPropertyDenyList: ['id', 'pk', 'name'], includeSampleValues: false, maxEnumValues: 10, includeCounts: true, @@ -140,13 +140,13 @@ describe('Effect-based Configuration Management', () => { expect(result.gremlin.traversalSource).toBe('custom'); }); - it('should parse comma-separated blacklist', async () => { + it('should parse comma-separated denylist', async () => { process.env.GREMLIN_ENDPOINT = 'localhost:8182'; - process.env.GREMLIN_ENUM_PROPERTY_BLACKLIST = 'id, pk, name, description'; + process.env.GREMLIN_ENUM_PROPERTY_DENYLIST = 'id, pk, name, description'; const result = await Effect.runPromise(AppConfig); - expect(result.schema.enumPropertyBlacklist).toEqual(['id', 'pk', 'name', 'description']); + expect(result.schema.enumPropertyDenyList).toEqual(['id', 'pk', 'name', 'description']); }); it('should validate log level enum', async () => { @@ -241,7 +241,7 @@ describe('Effect-based Configuration Management', () => { expect(result.gremlin.idleTimeout).toBeDefined(); expect(result.schema.enumDiscoveryEnabled).toBeDefined(); expect(result.schema.enumCardinalityThreshold).toBeDefined(); - expect(result.schema.enumPropertyBlacklist).toBeDefined(); + expect(result.schema.enumPropertyDenyList).toBeDefined(); expect(result.schema.includeSampleValues).toBeDefined(); expect(result.schema.maxEnumValues).toBeDefined(); expect(result.schema.includeCounts).toBeDefined(); diff --git a/gremlin-mcp/src/main/javascript/tests/property-analyzer.test.ts b/gremlin-mcp/src/main/javascript/tests/property-analyzer.test.ts index 24abbf5a32..26b5ce65ee 100644 --- a/gremlin-mcp/src/main/javascript/tests/property-analyzer.test.ts +++ b/gremlin-mcp/src/main/javascript/tests/property-analyzer.test.ts @@ -47,7 +47,7 @@ const mockGetSamplePropertyValues = getSamplePropertyValues as jest.MockedFuncti >; const mockConfig = { - enumPropertyBlacklist: [], + enumPropertyDenyList: [], maxEnumValues: 10, includeSampleValues: false, includeCounts: false, @@ -85,9 +85,9 @@ describe('property-analyzer', () => { expect(result.enum).toBeUndefined(); }); - it('should handle blacklisted properties', () => { - const blacklistConfig = { ...mockConfig, enumPropertyBlacklist: ['id'] }; - const result = analyzePropertyFromValues('id', ['a', 'b'], blacklistConfig); + it('should handle denylisted properties', () => { + const denyListConfig = { ...mockConfig, enumPropertyDenyList: ['id'] }; + const result = analyzePropertyFromValues('id', ['a', 'b'], denyListConfig); expect(result.type).toEqual(['unknown']); }); diff --git a/gremlin-mcp/src/main/javascript/tests/schema-assembly.test.ts b/gremlin-mcp/src/main/javascript/tests/schema-assembly.test.ts index f3d553851d..1557840984 100644 --- a/gremlin-mcp/src/main/javascript/tests/schema-assembly.test.ts +++ b/gremlin-mcp/src/main/javascript/tests/schema-assembly.test.ts @@ -42,7 +42,7 @@ describe('schema-assembly', () => { maxEnumValues: 10, includeCounts: true, enumCardinalityThreshold: 5, - enumPropertyBlacklist: [], + enumPropertyDenyList: [], timeoutMs: 30000, batchSize: 10, }; @@ -396,7 +396,7 @@ describe('schema-assembly', () => { maxEnumValues: 5, includeCounts: false, enumCardinalityThreshold: 3, - enumPropertyBlacklist: [], + enumPropertyDenyList: [], // timeoutMs and batchSize are optional };
