This is an automated email from the ASF dual-hosted git repository. wusheng pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/skywalking-mcp.git
The following commit(s) were added to refs/heads/main by this push: new da59903 feat: add streamable-http server and more tools (#7) da59903 is described below commit da59903d841c04082e1e7b26baa02226d0e8c5bc Author: Zixin Zhou <zhouzi...@apache.org> AuthorDate: Wed Jul 2 06:46:41 2025 +0800 feat: add streamable-http server and more tools (#7) - Log Analysis Tool: query_logs - Metrics Analysis Tools: query_single_metrics, query_top_n_metrics - Trace Analysis Tools: get_cold_trace_details, query_traces and polish the get_trace_details tool - Adds streamable HTTP server support --- .github/dependabot.yml | 6 +- README.md | 25 +- cmd/skywalking-mcp/main.go | 1 + internal/config/config.go | 21 ++ internal/swmcp/server.go | 83 +++++ internal/swmcp/sse.go | 2 +- internal/swmcp/stdio.go | 68 ---- internal/swmcp/streamable.go | 73 ++++ internal/tools/common.go | 170 +++++++++ internal/tools/log.go | 126 +++++++ internal/tools/metric.go | 470 ++++++++++++++++++++++++ internal/tools/tools.go | 8 +- internal/tools/trace.go | 826 ++++++++++++++++++++++++++++++++++++++++++- 13 files changed, 1791 insertions(+), 88 deletions(-) diff --git a/.github/dependabot.yml b/.github/dependabot.yml index aacd3d1..a7afd66 100644 --- a/.github/dependabot.yml +++ b/.github/dependabot.yml @@ -20,11 +20,13 @@ updates: directory: "/" schedule: interval: "daily" + reviewers: + - "CodePrometheus" assignees: - "CodePrometheus" + ignore: + - dependency-name: "github.com/apache/skywalking-cli" groups: actions-deps: patterns: - "*" - exclude-patterns: - - "github.com/apache/skywalking-cli" diff --git a/README.md b/README.md index 3c73c89..8a120b8 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,8 @@ Apache SkyWalking MCP <img src="http://skywalking.apache.org/assets/logo.svg" alt="Sky Walking logo" height="90px" align="right" /> -**SkyWalking-MCP**: A [Model Context Protocol][mcp] (MCP) server for integrating AI agents with Skywalking OAP and the surrounding ecosystem. +**SkyWalking-MCP**: A [Model Context Protocol][mcp] (MCP) server for integrating AI agents with Skywalking OAP and the +surrounding ecosystem. **SkyWalking**: an APM(application performance monitor) system, especially designed for microservices, cloud native and container-based (Docker, Kubernetes, Mesos) architectures. @@ -32,6 +33,7 @@ Available Commands: help Help about any command sse Start SSE server stdio Start stdio server + streamable Start Streamable server Flags: -h, --help help for swmcp @@ -96,13 +98,30 @@ If using Docker: } ``` +## Available Tools + +SkyWalking MCP provides the following tools to query and analyze SkyWalking OAP data: + +| Category | Tool Name | Description | Key Features | +|-------------|--------------------------|----------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| **Trace** | `get_trace_details` | Get detailed trace information | Retrieve trace by ID; **Multiple views**: `full` (complete trace), `summary` (overview with metrics), `errors_only` (error spans only); Detailed span analysis | +| **Trace** | `get_cold_trace_details` | Get trace details from cold storage | Query historical traces from BanyanDB; **Multiple views**: `full`, `summary`, `errors_only`; Duration-based search; Historical incident investigation | +| **Trace** | `query_traces` | Query traces with intelligent analysis | Multi-condition filtering (service, endpoint, duration, state, tags); **Multiple views**: `full` (raw data), `summary` (intelligent analysis with performance insights), `errors_only` (error traces); Sort options; Slow trace detection; Performance metrics and statistics | +| **Metrics** | `query_single_metrics` | Query single metric values | Get specific metric values (CPM, response time, SLA, Apdex); Multiple entity scopes (Service, ServiceInstance, Endpoint, Process, Relations); Time range and cold storage support | +| **Metrics** | `query_top_n_metrics` | Query top N metric rankings | Rank entities by metric values; Configurable top N count; Ascending/descending order; Scope-based filtering; Performance analysis and issue identification | +| **Log** | `query_logs` | Query logs from SkyWalking OAP | Filter by service, instance, endpoint, trace ID, tags; Time range queries; Cold storage support; Pagination support | + ## Contact Us + * Submit [an issue](https://github.com/apache/skywalking/issues/new) by using [MCP] as title prefix. -* Mail list: **d...@skywalking.apache.org**. Mail to `dev-subscr...@skywalking.apache.org`, follow the reply to subscribe the mail list. -* Join `skywalking` channel at [Apache Slack](http://s.apache.org/slack-invite). If the link is not working, find the latest one at [Apache INFRA WIKI](https://cwiki.apache.org/confluence/display/INFRA/Slack+Guest+Invites). +* Mail list: **d...@skywalking.apache.org**. Mail to `dev-subscr...@skywalking.apache.org`, follow the reply to subscribe + the mail list. +* Join `skywalking` channel at [Apache Slack](http://s.apache.org/slack-invite). If the link is not working, find the + latest one at [Apache INFRA WIKI](https://cwiki.apache.org/confluence/display/INFRA/Slack+Guest+Invites). * Twitter, [ASFSkyWalking](https://twitter.com/ASFSkyWalking) ## License + [Apache 2.0 License.](/LICENSE) [mcp]: https://modelcontextprotocol.io/ \ No newline at end of file diff --git a/cmd/skywalking-mcp/main.go b/cmd/skywalking-mcp/main.go index 0504f34..8ad4014 100644 --- a/cmd/skywalking-mcp/main.go +++ b/cmd/skywalking-mcp/main.go @@ -79,6 +79,7 @@ func init() { // Add subcommands rootCmd.AddCommand(swmcp.NewStdioServer()) rootCmd.AddCommand(swmcp.NewSSEServer()) + rootCmd.AddCommand(swmcp.NewStreamable()) } func main() { diff --git a/internal/config/config.go b/internal/config/config.go index 81a0f57..239b893 100644 --- a/internal/config/config.go +++ b/internal/config/config.go @@ -68,3 +68,24 @@ type SSEServerConfig struct { // Base path for the sse server BasePath string } + +type StreamableServerConfig struct { + // SkyWalking OAP URL to target for API requests (e.g. localhost:12800) + URL string + + // ReadOnly indicates if we should only offer read-only tools + ReadOnly bool + + // Path to the log file if not stderr + LogFilePath string + + // LogCommands indicates if we should log commands + LogCommands bool + + // The host and port to start the Streamable HTTP transport on + // e.g. ":8080" and the default streamable http endpoint will be "/mcp" + Address string + + // Base path for the Streamable HTTP transport server + EndpointPath string +} diff --git a/internal/swmcp/server.go b/internal/swmcp/server.go index 07067cd..57e2694 100644 --- a/internal/swmcp/server.go +++ b/internal/swmcp/server.go @@ -18,12 +18,18 @@ package swmcp import ( + "context" "fmt" + "net/http" "os" + "strings" "github.com/mark3labs/mcp-go/server" "github.com/sirupsen/logrus" + "github.com/apache/skywalking-cli/pkg/contextkey" + + "github.com/apache/skywalking-mcp/internal/config" "github.com/apache/skywalking-mcp/internal/tools" ) @@ -37,6 +43,8 @@ func newMcpServer() *server.MCPServer { server.WithLogging()) tools.AddTraceTools(mcpServer) + tools.AddMetricsTools(mcpServer) + tools.AddLogTools(mcpServer) return mcpServer } @@ -58,3 +66,78 @@ func initLogger(logFilePath string) (*logrus.Logger, error) { return logrusLogger, nil } + +// WithSkyWalkingURLAndInsecure adds SkyWalking URL and insecure flag to the context +// This ensures all downstream requests will have contextkey.BaseURL{} and contextkey.Insecure{} +func WithSkyWalkingURLAndInsecure(ctx context.Context, url string, insecure bool) context.Context { + ctx = context.WithValue(ctx, contextkey.BaseURL{}, url) + ctx = context.WithValue(ctx, contextkey.Insecure{}, insecure) + return ctx +} + +const ( + skywalkingURLEnvVar = "SW_URL" +) + +// finalizeURL ensures the URL ends with "/graphql". +func finalizeURL(urlStr string) string { + if !strings.HasSuffix(urlStr, "/graphql") { + urlStr = strings.TrimRight(urlStr, "/") + "/graphql" + } + return urlStr +} + +// urlAndInsecureFromEnv extracts URL and insecure flag purely from environment variables. +func urlAndInsecureFromEnv() (string, bool) { + urlStr := os.Getenv(skywalkingURLEnvVar) + if urlStr == "" { + urlStr = config.DefaultSWURL + } + return finalizeURL(urlStr), false +} + +// urlAndInsecureFromHeaders extracts URL and insecure flag for a request. +// URL is sourced from Header > Environment > Default. +// Insecure flag is now hardcoded to false. +func urlAndInsecureFromHeaders(req *http.Request) (string, bool) { + urlStr := req.Header.Get("SW-URL") + if urlStr == "" { + urlStr = os.Getenv(skywalkingURLEnvVar) + if urlStr == "" { + urlStr = config.DefaultSWURL + } + } + + return finalizeURL(urlStr), false +} + +// WithSkyWalkingContextFromEnv injects the SkyWalking URL and insecure +// settings from environment variables into the context. +var WithSkyWalkingContextFromEnv server.StdioContextFunc = func(ctx context.Context) context.Context { + urlStr, _ := urlAndInsecureFromEnv() + return WithSkyWalkingURLAndInsecure(ctx, urlStr, false) +} + +// withSkyWalkingContextFromRequest is the shared logic for enriching context from an http.Request. +func withSkyWalkingContextFromRequest(ctx context.Context, req *http.Request) context.Context { + urlStr, _ := urlAndInsecureFromHeaders(req) + return WithSkyWalkingURLAndInsecure(ctx, urlStr, false) +} + +// EnhanceStdioContextFunc returns a StdioContextFunc that enriches the context +// with SkyWalking settings from the environment. +func EnhanceStdioContextFunc() server.StdioContextFunc { + return WithSkyWalkingContextFromEnv +} + +// EnhanceSSEContextFunc returns a SSEContextFunc that enriches the context +// with SkyWalking settings from SSE request headers. +func EnhanceSSEContextFunc() server.SSEContextFunc { + return withSkyWalkingContextFromRequest +} + +// EnhanceHTTPContextFunc returns a HTTPContextFunc that enriches the context +// with SkyWalking settings from HTTP request headers. +func EnhanceHTTPContextFunc() server.HTTPContextFunc { + return withSkyWalkingContextFromRequest +} diff --git a/internal/swmcp/sse.go b/internal/swmcp/sse.go index 8f184b6..fd3ccd6 100644 --- a/internal/swmcp/sse.go +++ b/internal/swmcp/sse.go @@ -74,7 +74,7 @@ func runSSEServer(ctx context.Context, cfg *config.SSEServerConfig) error { sseServer := server.NewSSEServer( newMcpServer(), server.WithStaticBasePath(cfg.BasePath), - server.WithSSEContextFunc(EnhanceHTTPContextFunc()), + server.WithSSEContextFunc(EnhanceSSEContextFunc()), ) ssePath := sseServer.CompleteSsePath() log.Printf("Starting SkyWalking MCP server using SSE transport listening on http://%s%s\n ", cfg.Address, ssePath) diff --git a/internal/swmcp/stdio.go b/internal/swmcp/stdio.go index 95c59e7..059ba77 100644 --- a/internal/swmcp/stdio.go +++ b/internal/swmcp/stdio.go @@ -24,13 +24,10 @@ import ( "io" "log" "log/slog" - "net/http" "os" "os/signal" - "strings" "syscall" - "github.com/apache/skywalking-cli/pkg/contextkey" "github.com/mark3labs/mcp-go/server" "github.com/spf13/cobra" "github.com/spf13/viper" @@ -93,7 +90,6 @@ func runStdioServer(ctx context.Context, cfg *config.StdioServerConfig) error { errC <- stdioServer.Listen(ctx, in, out) }() - // Output github-mcp-server string _, _ = fmt.Fprintf(os.Stderr, "SkyWalking MCP Server running on stdio\n") // Wait for shutdown signal @@ -108,67 +104,3 @@ func runStdioServer(ctx context.Context, cfg *config.StdioServerConfig) error { return nil } - -var ExtractSWURLFromCfg server.StdioContextFunc = func(ctx context.Context) context.Context { - urlStr := viper.GetString("url") - if urlStr == "" { - urlStr = config.DefaultSWURL - } - - // we need to ensure the URL ends with "/graphql" - if !strings.HasSuffix(urlStr, "/graphql") { - urlStr = strings.TrimRight(urlStr, "/") + "/graphql" - } - return WithSkyWalkingURLAndInsecure(ctx, urlStr) -} - -var ExtractSWURLFromHeaders server.SSEContextFunc = func(ctx context.Context, req *http.Request) context.Context { - urlStr := req.Header.Get("SW-URL") - if urlStr == "" { - urlStr = viper.GetString("url") - if urlStr == "" { - urlStr = config.DefaultSWURL - } - } - - // we need to ensure the URL ends with "/graphql" - if !strings.HasSuffix(urlStr, "/graphql") { - urlStr = strings.TrimRight(urlStr, "/") + "/graphql" - } - return WithSkyWalkingURLAndInsecure(ctx, urlStr) -} - -func EnhanceStdioContextFuncs(funcs ...server.StdioContextFunc) server.StdioContextFunc { - return func(ctx context.Context) context.Context { - for _, f := range funcs { - ctx = f(ctx) - } - return ctx - } -} - -func EnhanceSSEContextFuncs(funcs ...server.SSEContextFunc) server.SSEContextFunc { - return func(ctx context.Context, r *http.Request) context.Context { - for _, f := range funcs { - ctx = f(ctx, r) - } - return ctx - } -} - -// WithSkyWalkingURLAndInsecure adds the SkyWalking URL and Insecure to the context. -func WithSkyWalkingURLAndInsecure(ctx context.Context, url string) context.Context { - ctx = context.WithValue(ctx, contextkey.BaseURL{}, url) - ctx = context.WithValue(ctx, contextkey.Insecure{}, false) - return ctx -} - -// EnhanceStdioContextFunc returns a StdioContextFunc that composes all the provided StdioContextFuncs. -func EnhanceStdioContextFunc() server.StdioContextFunc { - return EnhanceStdioContextFuncs(ExtractSWURLFromCfg) -} - -// EnhanceHTTPContextFunc returns a SSEContextFunc that composes all the provided HTTPContextFuncs. -func EnhanceHTTPContextFunc() server.SSEContextFunc { - return EnhanceSSEContextFuncs(ExtractSWURLFromHeaders) -} diff --git a/internal/swmcp/streamable.go b/internal/swmcp/streamable.go new file mode 100644 index 0000000..f3da2e5 --- /dev/null +++ b/internal/swmcp/streamable.go @@ -0,0 +1,73 @@ +// Licensed to Apache Software Foundation (ASF) under one or more contributor +// license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright +// ownership. Apache Software Foundation (ASF) licenses this file to you under +// the Apache License, Version 2.0 (the "License"); you may +// not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +package swmcp + +import ( + "fmt" + + "github.com/mark3labs/mcp-go/server" + log "github.com/sirupsen/logrus" + "github.com/spf13/cobra" + "github.com/spf13/viper" + + "github.com/apache/skywalking-mcp/internal/config" +) + +func NewStreamable() *cobra.Command { + streamableCmd := &cobra.Command{ + Use: "streamable", + Short: "Start Streamable server", + Long: `Starting SkyWalking MCP server with Streamable HTTP transport.`, + RunE: func(_ *cobra.Command, _ []string) error { + streamableConfig := config.StreamableServerConfig{ + Address: viper.GetString("address"), + EndpointPath: viper.GetString("endpoint-path"), + } + + return runStreamableServer(&streamableConfig) + }, + } + + // Add Streamable server specific flags + streamableCmd.Flags().String("address", "localhost:8000", + "The host and port to start the Streamable server on") + streamableCmd.Flags().String("endpoint-path", "/mcp", + "The path for the streamable-http server") + _ = viper.BindPFlag("address", streamableCmd.Flags().Lookup("address")) + _ = viper.BindPFlag("endpoint-path", streamableCmd.Flags().Lookup("endpoint-path")) + + return streamableCmd +} + +// runStreamableServer starts the Streamable server with the provided configuration. +func runStreamableServer(cfg *config.StreamableServerConfig) error { + httpServer := server.NewStreamableHTTPServer( + newMcpServer(), + server.WithStateLess(true), + server.WithLogger(log.StandardLogger()), + server.WithHTTPContextFunc(EnhanceHTTPContextFunc()), + server.WithEndpointPath(viper.GetString("endpoint-path")), + ) + log.Infof("streamable HTTP server listening on %s%s\n", cfg.Address, cfg.EndpointPath) + + if err := httpServer.Start(cfg.Address); err != nil { + return fmt.Errorf("streamable HTTP server error: %v", err) + } + + return nil +} diff --git a/internal/tools/common.go b/internal/tools/common.go new file mode 100644 index 0000000..08475f9 --- /dev/null +++ b/internal/tools/common.go @@ -0,0 +1,170 @@ +// Licensed to Apache Software Foundation (ASF) under one or more contributor +// license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright +// ownership. Apache Software Foundation (ASF) licenses this file to you under +// the Apache License, Version 2.0 (the "License"); you may +// not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +package tools + +import ( + "fmt" + "time" + + api "skywalking.apache.org/repo/goapi/query" +) + +// Default values +const ( + DefaultPageSize = 15 + DefaultPageNum = 1 + DefaultStep = "MINUTE" + DefaultDuration = 30 // minutes +) + +// Error messages +const ( + ErrMissingDuration = "missing required parameter: duration" + ErrMarshalFailed = "failed to marshal result: %v" +) + +// FormatTimeByStep formats time according to step granularity +func FormatTimeByStep(t time.Time, step api.Step) string { + switch step { + case api.StepDay: + return t.Format("2006-01-02") + case api.StepHour: + return t.Format("2006-01-02 15") + case api.StepMinute: + return t.Format("2006-01-02 1504") + case api.StepSecond: + return t.Format("2006-01-02 150405") + default: + return t.Format("2006-01-02 15:04:05") + } +} + +// ParseDuration converts duration string to api.Duration +func ParseDuration(durationStr string, coldStage bool) api.Duration { + now := time.Now() + var startTime, endTime time.Time + var step api.Step + + duration, err := time.ParseDuration(durationStr) + if err == nil { + if duration < 0 { + startTime = now.Add(duration) + endTime = now + } else { + startTime = now + endTime = now.Add(duration) + } + step = determineStep(duration) + } else { + startTime, endTime, step = parseLegacyDuration(durationStr) + } + + if !step.IsValid() { + step = api.StepMinute + } + + return api.Duration{ + Start: FormatTimeByStep(startTime, step), + End: FormatTimeByStep(endTime, step), + Step: step, + ColdStage: &coldStage, + } +} + +// BuildPagination creates pagination with defaults +func BuildPagination(pageNum, pageSize int) *api.Pagination { + if pageNum <= 0 { + pageNum = DefaultPageNum + } + if pageSize <= 0 { + pageSize = DefaultPageSize + } + return &api.Pagination{ + PageNum: &pageNum, + PageSize: pageSize, + } +} + +// BuildDuration creates duration from parameters +func BuildDuration(start, end, step string, cold bool, defaultDurationMinutes int) api.Duration { + if start != "" || end != "" { + stepEnum := api.Step(step) + if step == "" || !stepEnum.IsValid() { + stepEnum = DefaultStep + } + return api.Duration{ + Start: start, + End: end, + Step: stepEnum, + ColdStage: &cold, + } + } + + if defaultDurationMinutes <= 0 { + defaultDurationMinutes = DefaultDuration + } + defaultDurationStr := fmt.Sprintf("-%dm", defaultDurationMinutes) + return ParseDuration(defaultDurationStr, cold) +} + +// determineStep determines the step based on the duration +func determineStep(duration time.Duration) api.Step { + if duration >= 24*time.Hour { + return api.StepDay + } else if duration >= time.Hour { + return api.StepHour + } else if duration >= time.Minute { + return api.StepMinute + } + return api.StepSecond +} + +// parseLegacyDuration parses legacy duration strings like "7d", "24h" +func parseLegacyDuration(durationStr string) (startTime, endTime time.Time, step api.Step) { + now := time.Now() + if len(durationStr) > 1 && (durationStr[len(durationStr)-1] == 'd' || durationStr[len(durationStr)-1] == 'D') { + var days int + if _, parseErr := fmt.Sscanf(durationStr[:len(durationStr)-1], "%d", &days); parseErr == nil && days > 0 { + startTime = now.AddDate(0, 0, -days) + endTime = now + step = api.StepDay + return startTime, endTime, step + } + startTime = now.AddDate(0, 0, -7) + endTime = now + step = api.StepDay + return startTime, endTime, step + } + if len(durationStr) > 1 && (durationStr[len(durationStr)-1] == 'h' || durationStr[len(durationStr)-1] == 'H') { + var hours int + if _, parseErr := fmt.Sscanf(durationStr[:len(durationStr)-1], "%d", &hours); parseErr == nil && hours > 0 { + startTime = now.Add(-time.Duration(hours) * time.Hour) + endTime = now + step = api.StepHour + return startTime, endTime, step + } + startTime = now.Add(-1 * time.Hour) + endTime = now + step = api.StepHour + return startTime, endTime, step + } + startTime = now.AddDate(0, 0, -7) + endTime = now + step = api.StepDay + return startTime, endTime, step +} diff --git a/internal/tools/log.go b/internal/tools/log.go new file mode 100644 index 0000000..9a027db --- /dev/null +++ b/internal/tools/log.go @@ -0,0 +1,126 @@ +// Licensed to Apache Software Foundation (ASF) under one or more contributor +// license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright +// ownership. Apache Software Foundation (ASF) licenses this file to you under +// the Apache License, Version 2.0 (the "License"); you may +// not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +package tools + +import ( + "context" + "encoding/json" + "fmt" + + "github.com/mark3labs/mcp-go/mcp" + "github.com/mark3labs/mcp-go/server" + api "skywalking.apache.org/repo/goapi/query" + + swlog "github.com/apache/skywalking-cli/pkg/graphql/log" +) + +// AddLogTools registers log-related tools with the MCP server +func AddLogTools(mcp *server.MCPServer) { + LogQueryTool.Register(mcp) +} + +type LogTag struct { + Key string `json:"key"` + Value string `json:"value"` +} + +type LogQueryRequest struct { + ServiceID string `json:"service_id,omitempty"` + ServiceInstanceID string `json:"service_instance_id,omitempty"` + EndpointID string `json:"endpoint_id,omitempty"` + TraceID string `json:"trace_id,omitempty"` + Tags []LogTag `json:"tags,omitempty"` + Start string `json:"start,omitempty"` + End string `json:"end,omitempty"` + Step string `json:"step,omitempty"` + Cold bool `json:"cold,omitempty"` + PageNum int `json:"page_num,omitempty"` + PageSize int `json:"page_size,omitempty"` +} + +// buildLogQueryCondition builds the log query condition from request parameters +func buildLogQueryCondition(req *LogQueryRequest) *api.LogQueryCondition { + duration := BuildDuration(req.Start, req.End, req.Step, req.Cold, DefaultDuration) + + var tags []*api.LogTag + for _, t := range req.Tags { + v := t.Value + tags = append(tags, &api.LogTag{Key: t.Key, Value: &v}) + } + + paging := BuildPagination(req.PageNum, req.PageSize) + + cond := &api.LogQueryCondition{ + ServiceID: &req.ServiceID, + ServiceInstanceID: &req.ServiceInstanceID, + EndpointID: &req.EndpointID, + RelatedTrace: &api.TraceScopeCondition{TraceID: req.TraceID}, + QueryDuration: &duration, + Paging: paging, + } + + if len(tags) > 0 { + cond.Tags = tags + } + return cond +} + +// queryLogs queries logs from SkyWalking OAP +func queryLogs(ctx context.Context, req *LogQueryRequest) (*mcp.CallToolResult, error) { + cond := buildLogQueryCondition(req) + + logs, err := swlog.Logs(ctx, cond) + if err != nil { + return mcp.NewToolResultError(fmt.Sprintf("failed to query logs: %v", err)), nil + } + + jsonBytes, err := json.Marshal(logs) + if err != nil { + return mcp.NewToolResultError(fmt.Sprintf(ErrMarshalFailed, err)), nil + } + return mcp.NewToolResultText(string(jsonBytes)), nil +} + +var LogQueryTool = NewTool[LogQueryRequest, *mcp.CallToolResult]( + "query_logs", + `Query logs from SkyWalking OAP with flexible filters. + +Workflow: +1. Use this tool to find logs matching specific criteria +2. Specify one or more query conditions to narrow down results +3. Use duration to limit the time range for the search +4. Supports filtering by service, instance, endpoint, trace, tags, and time +5. Supports cold storage query and pagination + +Examples: +- {"service_id": "Your_ApplicationName", "start": "2024-06-01 12:00:00", "end": "2024-06-01 13:00:00"}: Query logs for a service in a time range +- {"trace_id": "abc123..."}: Query logs related to a specific trace +- {"tags": [{"key": "level", "value": "ERROR"}], "cold": true}: Query error logs from cold storage`, + queryLogs, + mcp.WithString("service_id", mcp.Description("Service ID to filter logs.")), + mcp.WithString("service_instance_id", mcp.Description("Service instance ID to filter logs.")), + mcp.WithString("endpoint_id", mcp.Description("Endpoint ID to filter logs.")), + mcp.WithString("trace_id", mcp.Description("Related trace ID.")), + mcp.WithArray("tags", mcp.Description("Array of log tags, each with key and value.")), + mcp.WithString("start", mcp.Description("Start time for the query.")), + mcp.WithString("end", mcp.Description("End time for the query.")), + mcp.WithString("step", mcp.Enum("SECOND", "MINUTE", "HOUR", "DAY"), mcp.Description("Time step granularity.")), + mcp.WithBoolean("cold", mcp.Description("Whether to query from cold-stage storage.")), + mcp.WithNumber("page_num", mcp.Description("Page number, default 1.")), + mcp.WithNumber("page_size", mcp.Description("Page size, default 15.")), +) diff --git a/internal/tools/metric.go b/internal/tools/metric.go new file mode 100644 index 0000000..27017f2 --- /dev/null +++ b/internal/tools/metric.go @@ -0,0 +1,470 @@ +// Licensed to Apache Software Foundation (ASF) under one or more contributor +// license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright +// ownership. Apache Software Foundation (ASF) licenses this file to you under +// the Apache License, Version 2.0 (the "License"); you may +// not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +package tools + +import ( + "context" + "encoding/base64" + "encoding/json" + "errors" + "fmt" + "strings" + + "github.com/mark3labs/mcp-go/mcp" + "github.com/mark3labs/mcp-go/server" + api "skywalking.apache.org/repo/goapi/query" + + "github.com/apache/skywalking-cli/pkg/graphql/metrics" +) + +// AddMetricsTools registers metrics-related tools with the MCP server +func AddMetricsTools(mcp *server.MCPServer) { + SingleMetricsTool.Register(mcp) + TopNMetricsTool.Register(mcp) +} + +// Error messages +const ( + ErrMissingMetricsName = "missing required parameter: metrics_name" + ErrInvalidTopN = "top_n must be a positive integer" + ErrFailedToQueryMetrics = "failed to query metrics: %v" +) + +// SingleMetricsRequest defines the parameters for the single metrics tool +type SingleMetricsRequest struct { + MetricsName string `json:"metrics_name"` + Scope string `json:"scope,omitempty"` + ServiceName string `json:"service_name,omitempty"` + ServiceInstanceName string `json:"service_instance_name,omitempty"` + EndpointName string `json:"endpoint_name,omitempty"` + ProcessName string `json:"process_name,omitempty"` + DestServiceName string `json:"dest_service_name,omitempty"` + DestServiceInstanceName string `json:"dest_service_instance_name,omitempty"` + DestEndpointName string `json:"dest_endpoint_name,omitempty"` + DestProcessName string `json:"dest_process_name,omitempty"` + Duration string `json:"duration,omitempty"` + Start string `json:"start,omitempty"` + End string `json:"end,omitempty"` + Step string `json:"step,omitempty"` + Cold bool `json:"cold,omitempty"` +} + +// TopNMetricsRequest defines the parameters for the top N metrics tool +// ParentService and Normal are used for service/entity identification, matching swctl behavior. +type TopNMetricsRequest struct { + MetricsName string `json:"metrics_name"` + TopN int `json:"top_n"` + Order string `json:"order,omitempty"` + Scope string `json:"scope,omitempty"` + ServiceID string `json:"service_id,omitempty"` + ServiceName string `json:"service_name,omitempty"` + ParentService string `json:"parent_service,omitempty"` + Normal bool `json:"normal,omitempty"` + Duration string `json:"duration,omitempty"` + Start string `json:"start,omitempty"` + End string `json:"end,omitempty"` + Step string `json:"step,omitempty"` + Cold bool `json:"cold,omitempty"` +} + +// MetricsValue represents the result of metrics query +type MetricsValue struct { + Value int `json:"value"` +} + +// ParseScopeInTop infers the scope for topN metrics based on metricsName +func ParseScopeInTop(metricsName string) api.Scope { + scope := api.ScopeService + if strings.HasPrefix(metricsName, "service_instance") { + scope = api.ScopeServiceInstance + } else if strings.HasPrefix(metricsName, "endpoint") { + scope = api.ScopeEndpoint + } + return scope +} + +// validateSingleMetricsRequest validates single metrics request parameters +func validateSingleMetricsRequest(req *SingleMetricsRequest) error { + if req.MetricsName == "" { + return errors.New(ErrMissingMetricsName) + } + return nil +} + +// validateTopNMetricsRequest validates top N metrics request parameters +func validateTopNMetricsRequest(req *TopNMetricsRequest) error { + if req.MetricsName == "" { + return errors.New(ErrMissingMetricsName) + } + // Set default top_n to 5 if not provided + if req.TopN == 0 { + req.TopN = 5 + } + if req.TopN <= 0 { + return errors.New(ErrInvalidTopN) + } + return nil +} + +// buildTopNCondition builds the top N condition from request parameters +func buildTopNCondition(req *TopNMetricsRequest) *api.TopNCondition { + parentService := "" + normal := false + // Parse service-id if present, otherwise use ServiceName if provided + if req.ServiceID != "" { + var err error + parentService, normal, err = ParseServiceID(req.ServiceID) + if err != nil { + parentService = "" + normal = false + } + } else if req.ServiceName != "" { + parentService = req.ServiceName + } + + condition := &api.TopNCondition{ + Name: req.MetricsName, + ParentService: &parentService, + Normal: &normal, + TopN: req.TopN, + Order: api.OrderDes, + } + if req.Order != "" { + order := api.Order(req.Order) + if order.IsValid() { + condition.Order = order + } + } + // Always set scope, using ParseScopeInTop if not provided + var scope api.Scope + if req.Scope != "" { + scope = api.Scope(req.Scope) + } else { + scope = ParseScopeInTop(req.MetricsName) + } + condition.Scope = &scope + + return condition +} + +// ParseServiceID decodes a service id into service name and normal flag +func ParseServiceID(id string) (name string, isNormal bool, err error) { + if id == "" { + return "", false, nil + } + parts := strings.Split(id, ".") + if len(parts) != 2 { + return "", false, fmt.Errorf("invalid service id, cannot be splitted into 2 parts. %v", id) + } + nameBytes, err := base64.StdEncoding.DecodeString(parts[0]) + if err != nil { + return "", false, err + } + name = string(nameBytes) + isNormal = parts[1] == "1" + return name, isNormal, nil +} + +// buildMetricsCondition builds the metrics condition from request parameters +func buildMetricsCondition(req *SingleMetricsRequest) *api.MetricsCondition { + condition := &api.MetricsCondition{ + Name: req.MetricsName, + } + + entity := &api.Entity{} + if req.Scope != "" { + scope := api.Scope(req.Scope) + entity.Scope = &scope + } + if req.ServiceName != "" { + entity.ServiceName = &req.ServiceName + } + if req.ServiceInstanceName != "" { + entity.ServiceInstanceName = &req.ServiceInstanceName + } + if req.EndpointName != "" { + entity.EndpointName = &req.EndpointName + } + if req.ProcessName != "" { + entity.ProcessName = &req.ProcessName + } + if req.DestServiceName != "" { + entity.DestServiceName = &req.DestServiceName + } + if req.DestServiceInstanceName != "" { + entity.DestServiceInstanceName = &req.DestServiceInstanceName + } + if req.DestEndpointName != "" { + entity.DestEndpointName = &req.DestEndpointName + } + if req.DestProcessName != "" { + entity.DestProcessName = &req.DestProcessName + } + condition.Entity = entity + return condition +} + +// querySingleMetrics queries single-value metrics +func querySingleMetrics(ctx context.Context, req *SingleMetricsRequest) (*mcp.CallToolResult, error) { + if err := validateSingleMetricsRequest(req); err != nil { + return mcp.NewToolResultError(err.Error()), nil + } + condition := buildMetricsCondition(req) + + var duration api.Duration + if req.Duration != "" { + duration = ParseDuration(req.Duration, req.Cold) + } else { + duration = BuildDuration(req.Start, req.End, req.Step, req.Cold, 0) + } + + value, err := metrics.IntValues(ctx, *condition, duration) + if err != nil { + return mcp.NewToolResultError(fmt.Sprintf(ErrFailedToQueryMetrics, err)), nil + } + result := MetricsValue{Value: value} + jsonBytes, err := json.Marshal(result) + if err != nil { + return mcp.NewToolResultError(fmt.Sprintf(ErrMarshalFailed, err)), nil + } + return mcp.NewToolResultText(string(jsonBytes)), nil +} + +// queryTopNMetrics queries top N metrics +func queryTopNMetrics(ctx context.Context, req *TopNMetricsRequest) (*mcp.CallToolResult, error) { + if err := validateTopNMetricsRequest(req); err != nil { + return mcp.NewToolResultError(err.Error()), nil + } + condition := buildTopNCondition(req) + + // Set default duration if none provided + if req.Duration == "" && req.Start == "" && req.End == "" { + req.Duration = "30m" + } + + var duration api.Duration + if req.Duration != "" { + duration = ParseDuration(req.Duration, req.Cold) + } else { + duration = BuildDuration(req.Start, req.End, req.Step, req.Cold, 0) + } + + values, err := metrics.SortMetrics(ctx, *condition, duration) + if err != nil { + return mcp.NewToolResultError(fmt.Sprintf(ErrFailedToQueryMetrics, err)), nil + } + jsonBytes, err := json.Marshal(values) + if err != nil { + return mcp.NewToolResultError(fmt.Sprintf(ErrMarshalFailed, err)), nil + } + return mcp.NewToolResultText(string(jsonBytes)), nil +} + +// SingleMetricsTool is a tool for querying single-value metrics +var SingleMetricsTool = NewTool[SingleMetricsRequest, *mcp.CallToolResult]( + "query_single_metrics", + `This tool queries single-value metrics defined in backend OAL from SkyWalking OAP. + +Workflow: +1. Use this tool when you need to get a single metric value for a specific entity +2. Specify the metrics name and entity details (service, endpoint, etc.) +3. Set the time range for the query +4. Get the metric value as a single integer result + +Metrics Examples: +- service_cpm: Calls per minute for a service +- endpoint_cpm: Calls per minute for an endpoint +- service_resp_time: Response time for a service +- service_apdex: Apdex score for a service +- service_sla: SLA percentage for a service + +Entity Scopes: +- Service: Service-level metrics +- ServiceInstance: Service instance-level metrics +- Endpoint: Endpoint-level metrics +- Process: Process-level metrics +- ServiceRelation: Service relationship metrics +- ServiceInstanceRelation: Service instance relationship metrics +- EndpointRelation: Endpoint relationship metrics +- ProcessRelation: Process relationship metrics + +Time Format: +- Absolute time: "2023-01-01 12:00:00", "2023-01-01 12" +- Relative time: "-30m" (30 minutes ago), "-1h" (1 hour ago) +- Step: "SECOND", "MINUTE", "HOUR", "DAY" + +Examples: +- {"metrics_name": "service_cpm", "service_name": "business-zone::projectC", "duration": "1h"}: Get calls per minute for a service in the last hour +- {"metrics_name": "endpoint_cpm", "service_name": "business-zone::projectC", + "endpoint_name": "/projectC/{value}", "duration": "30m"}: Get calls per minute for a specific endpoint in the last 30 minutes +- {"metrics_name": "service_resp_time", "service_name": "web-service", + "start": "-1h", "end": "now", "step": "MINUTE"}: Get service response time with custom time range +- {"metrics_name": "service_apdex", "service_name": "api-gateway", "cold": true}: Get Apdex score from cold storage`, + querySingleMetrics, + mcp.WithTitleAnnotation("Query single-value metrics"), + mcp.WithString("metrics_name", mcp.Required(), + mcp.Description(`The name of the metrics to query. Examples: service_sla, endpoint_sla, +service_instance_sla, service_cpm, service_resp_time, service_apdex`), + ), + mcp.WithString("scope", + mcp.Enum(string(api.ScopeAll), string(api.ScopeService), string(api.ScopeServiceInstance), string(api.ScopeEndpoint), string(api.ScopeProcess), + string(api.ScopeServiceRelation), string(api.ScopeServiceInstanceRelation), string(api.ScopeEndpointRelation), string(api.ScopeProcessRelation)), + mcp.Description(`The scope of the metrics entity: +- 'Service': Service-level metrics (default) +- 'ServiceInstance': Service instance-level metrics +- 'Endpoint': Endpoint-level metrics +- 'Process': Process-level metrics +- 'ServiceRelation': Service relationship metrics +- 'ServiceInstanceRelation': Service instance relationship metrics +- 'EndpointRelation': Endpoint relationship metrics +- 'ProcessRelation': Process relationship metrics`), + ), + mcp.WithString("service_name", + mcp.Description("Service name to filter metrics. Use this to get metrics for a specific service."), + ), + mcp.WithString("service_instance_name", + mcp.Description("Service instance name to filter metrics. Use this to get metrics for a specific service instance."), + ), + mcp.WithString("endpoint_name", + mcp.Description("Endpoint name to filter metrics. Use this to get metrics for a specific endpoint."), + ), + mcp.WithString("process_name", + mcp.Description("Process name to filter metrics. Use this to get metrics for a specific process."), + ), + mcp.WithString("dest_service_name", + mcp.Description("Destination service name for relationship metrics. Use this for service relation scopes."), + ), + mcp.WithString("dest_service_instance_name", + mcp.Description("Destination service instance name for relationship metrics. Use this for service instance relation scopes."), + ), + mcp.WithString("dest_endpoint_name", + mcp.Description("Destination endpoint name for relationship metrics. Use this for endpoint relation scopes."), + ), + mcp.WithString("dest_process_name", + mcp.Description("Destination process name for relationship metrics. Use this for process relation scopes."), + ), + mcp.WithString("duration", + mcp.Description("Time duration for the query. Examples: \"1h\" (last 1 hour), \"30m\" (last 30 minutes), \"7d\" (last 7 days)"), + ), + mcp.WithString("start", + mcp.Description("Start time for the query. Examples: \"2023-01-01 12:00:00\", \"-1h\" (1 hour ago), \"-30m\" (30 minutes ago)"), + ), + mcp.WithString("end", + mcp.Description("End time for the query. Examples: \"2023-01-01 13:00:00\", \"now\", \"-10m\" (10 minutes ago)"), + ), + mcp.WithString("step", + mcp.Enum("SECOND", "MINUTE", "HOUR", "DAY"), + mcp.Description(`Time step between start time and end time: +- 'SECOND': Second-level granularity +- 'MINUTE': Minute-level granularity (default) +- 'HOUR': Hour-level granularity +- 'DAY': Day-level granularity`), + ), + mcp.WithBoolean("cold", + mcp.Description("Whether to query from cold-stage storage. Set to true for historical data queries."), + ), +) + +// TopNMetricsTool is a tool for querying top N metrics +var TopNMetricsTool = NewTool[TopNMetricsRequest, *mcp.CallToolResult]( + "query_top_n_metrics", + `This tool queries the top N entities sorted by the specified metrics from SkyWalking OAP. + +Workflow: +1. Use this tool when you need to find the top N entities based on a specific metric +2. Specify the metrics name and the number of top entities to retrieve +3. Set the time range for the query +4. Get a list of top N entities with their metric values + +Metrics Examples: +- service_sla: SLA percentage for services +- endpoint_sla: SLA percentage for endpoints +- service_instance_sla: SLA percentage for service instances +- service_cpm: Calls per minute for services +- service_resp_time: Response time for services +- service_apdex: Apdex score for services + +Entity Scopes: +- Service: Service-level metrics (default) +- ServiceInstance: Service instance-level metrics +- Endpoint: Endpoint-level metrics +- Process: Process-level metrics + +Order Options: +- ASC: Ascending order (lowest values first) +- DES: Descending order (highest values first, default) + +Time Format: +- Absolute time: "2023-01-01 12:00:00", "2023-01-01 12" +- Relative time: "-30m" (30 minutes ago), "-1h" (1 hour ago) +- Step: "SECOND", "MINUTE", "HOUR", "DAY" + +Examples: +- {"metrics_name": "service_sla", "top_n": 5, "duration": "1h"}: Get top 5 services with highest SLA in the last hour +- {"metrics_name": "endpoint_sla", "top_n": 10, "order": "ASC", "duration": "30m"}: Get top 10 endpoints with lowest SLA in the last 30 minutes +- {"metrics_name": "service_instance_sla", "top_n": 3, "service_name": "boutique::adservice", + "duration": "1h"}: Get top 3 instances of a specific service with highest SLA +- {"metrics_name": "service_cpm", "top_n": 5, "start": "-1h", "end": "now", + "step": "MINUTE"}: Get top 5 services with highest calls per minute with custom time range`, + queryTopNMetrics, + mcp.WithTitleAnnotation("Query top N metrics"), + mcp.WithString("metrics_name", mcp.Required(), + mcp.Description(`The name of the metrics to query. Examples: service_sla, endpoint_sla, +service_instance_sla, service_cpm, service_resp_time, service_apdex`), + ), + mcp.WithNumber("top_n", mcp.Required(), + mcp.Description("The number of top entities to retrieve. Must be a positive integer."), + ), + mcp.WithString("order", + mcp.Enum("ASC", "DES"), + mcp.Description(`The order by which the top entities are sorted: +- 'ASC': Ascending order (lowest values first) +- 'DES': Descending order (highest values first, default)`), + ), + mcp.WithString("scope", + mcp.Enum(string(api.ScopeAll), string(api.ScopeService), string(api.ScopeServiceInstance), string(api.ScopeEndpoint), string(api.ScopeProcess)), + mcp.Description(`The scope of the metrics entity: +- 'Service': Service-level metrics (default) +- 'ServiceInstance': Service instance-level metrics +- 'Endpoint': Endpoint-level metrics +- 'Process': Process-level metrics`), + ), + mcp.WithString("service_name", + mcp.Description("Parent service name to filter metrics. Use this to get top N entities within a specific service."), + ), + mcp.WithString("duration", + mcp.Description("Time duration for the query. Examples: \"1h\" (last 1 hour), \"30m\" (last 30 minutes), \"7d\" (last 7 days)"), + ), + mcp.WithString("start", + mcp.Description("Start time for the query. Examples: \"2023-01-01 12:00:00\", \"-1h\" (1 hour ago), \"-30m\" (30 minutes ago)"), + ), + mcp.WithString("end", + mcp.Description("End time for the query. Examples: \"2023-01-01 13:00:00\", \"now\", \"-10m\" (10 minutes ago)"), + ), + mcp.WithString("step", + mcp.Enum("SECOND", "MINUTE", "HOUR", "DAY"), + mcp.Description(`Time step between start time and end time: +- 'SECOND': Second-level granularity +- 'MINUTE': Minute-level granularity (default) +- 'HOUR': Hour-level granularity +- 'DAY': Day-level granularity`), + ), + mcp.WithBoolean("cold", + mcp.Description("Whether to query from cold-stage storage. Set to true for historical data queries."), + ), +) diff --git a/internal/tools/tools.go b/internal/tools/tools.go index c491fd9..0286e3f 100644 --- a/internal/tools/tools.go +++ b/internal/tools/tools.go @@ -29,13 +29,13 @@ import ( type Tool[T any, R any] struct { Name string Description string - Handler func(ctx context.Context, args T) (R, error) + Handler func(ctx context.Context, args *T) (R, error) Options []mcp.ToolOption } func NewTool[T any, R any]( name, desc string, - handler func(ctx context.Context, args T) (R, error), + handler func(ctx context.Context, args *T) (R, error), options ...mcp.ToolOption, ) *Tool[T, R] { return &Tool[T, R]{ @@ -59,7 +59,7 @@ func (t *Tool[T, R]) Register(server *server.MCPServer) { func ConvertTool[T any, R any]( name string, desc string, - handlerFunc func(ctx context.Context, args T) (R, error), + handlerFunc func(ctx context.Context, args *T) (R, error), options ...mcp.ToolOption, ) (mcp.Tool, server.ToolHandlerFunc, error) { baseOptions := []mcp.ToolOption{ @@ -77,7 +77,7 @@ func ConvertTool[T any, R any]( return nil, fmt.Errorf("failed to bind arguments: %w", err) } - result, err := handlerFunc(ctx, args) + result, err := handlerFunc(ctx, &args) if err != nil { return nil, err } diff --git a/internal/tools/trace.go b/internal/tools/trace.go index 5108d95..fb5f85b 100644 --- a/internal/tools/trace.go +++ b/internal/tools/trace.go @@ -19,35 +19,841 @@ package tools import ( "context" + "encoding/json" + "errors" "fmt" + "sort" + "strings" + "time" - "github.com/apache/skywalking-cli/pkg/graphql/trace" "github.com/mark3labs/mcp-go/mcp" "github.com/mark3labs/mcp-go/server" api "skywalking.apache.org/repo/goapi/query" + + "github.com/apache/skywalking-cli/pkg/graphql/trace" +) + +// AddTraceTools registers trace-related tools with the MCP server +func AddTraceTools(mcp *server.MCPServer) { + SearchTraceTool.Register(mcp) + ColdTraceTool.Register(mcp) + TracesQueryTool.Register(mcp) +} + +// View constants +const ( + ViewFull = "full" + ViewSummary = "summary" + ViewErrorsOnly = "errors_only" +) + +// Query order constants +const ( + QueryOrderStartTime = "start_time" + QueryOrderDuration = "duration" +) + +// Trace state constants +const ( + TraceStateSuccess = "success" + TraceStateError = "error" + TraceStateAll = "all" ) +// Error constants +const ( + ErrMissingTraceID = "missing required parameter: trace_id" + ErrFailedToQueryTrace = "failed to query trace '%s': %v" + ErrFailedToQueryColdTrace = "failed to query cold trace '%s': %v" + ErrFailedToQueryTraces = "failed to query traces: %v" + ErrNoFilterCondition = "at least one filter condition must be provided" + ErrInvalidDurationRange = "invalid duration range: min_duration (%d) > max_duration (%d)" + ErrNegativePageSize = "page_size cannot be negative" + ErrNegativePageNum = "page_num cannot be negative" + ErrInvalidTraceState = "invalid trace_state '%s', available states: %s, %s, %s" + ErrInvalidQueryOrder = "invalid query_order '%s', available orders: %s, %s" + ErrTraceNotFound = "trace with ID '%s' not found" + ErrInvalidView = "invalid view '%s', available views: %s, %s, %s" + ErrNoTracesFound = "no traces found matching the query criteria" +) + +const TimeFormatFull = "2006-01-02 15:04:05" + +// Trace-specific constants +const ( + DefaultTracePageSize = 20 + DefaultTraceDuration = "1h" +) + +// TraceRequest defines the parameters for the trace tool type TraceRequest struct { TraceID string `json:"trace_id"` + View string `json:"view,omitempty"` +} + +// ColdTraceRequest defines the parameters for the cold trace tool +type ColdTraceRequest struct { + TraceID string `json:"trace_id"` + Duration string `json:"duration"` + View string `json:"view,omitempty"` +} + +// SpanTag represents a span tag for filtering traces +type SpanTag struct { + Key string `json:"key"` + Value string `json:"value"` +} + +// TracesQueryRequest defines the parameters for the traces query tool +type TracesQueryRequest struct { + ServiceID string `json:"service_id,omitempty"` + ServiceInstanceID string `json:"service_instance_id,omitempty"` + TraceID string `json:"trace_id,omitempty"` + EndpointID string `json:"endpoint_id,omitempty"` + Duration string `json:"duration,omitempty"` + MinTraceDuration int64 `json:"min_trace_duration,omitempty"` + MaxTraceDuration int64 `json:"max_trace_duration,omitempty"` + TraceState string `json:"trace_state,omitempty"` + QueryOrder string `json:"query_order,omitempty"` + PageSize int `json:"page_size,omitempty"` + PageNum int `json:"page_num,omitempty"` + View string `json:"view,omitempty"` + SlowTraceThreshold int64 `json:"slow_trace_threshold,omitempty"` + Tags []SpanTag `json:"tags,omitempty"` + Cold bool `json:"cold,omitempty"` +} + +// TraceSummary provides a high-level overview of a trace +type TraceSummary struct { + TraceID string `json:"trace_id"` + TotalSpans int `json:"total_spans"` + Services []string `json:"services"` + TotalDuration int64 `json:"total_duration_ms"` + ErrorCount int `json:"error_count"` + HasErrors bool `json:"has_errors"` + RootEndpoint string `json:"root_endpoint"` + StartTime int64 `json:"start_time_ms"` + EndTime int64 `json:"end_time_ms"` } -func searchTrace(ctx context.Context, req TraceRequest) (*api.Trace, error) { +// TracesSummary provides a high-level overview of multiple traces +type TracesSummary struct { + TotalTraces int `json:"total_traces"` + SuccessCount int `json:"success_count"` + ErrorCount int `json:"error_count"` + Services []string `json:"services"` + Endpoints []string `json:"endpoints"` + AvgDuration float64 `json:"avg_duration_ms"` + MinDuration int64 `json:"min_duration_ms"` + MaxDuration int64 `json:"max_duration_ms"` + TimeRange TimeRange `json:"time_range"` + ErrorTraces []BasicTraceSummary `json:"error_traces,omitempty"` + SlowTraces []BasicTraceSummary `json:"slow_traces,omitempty"` +} + +// BasicTraceSummary provides essential information about a single trace +type BasicTraceSummary struct { + TraceID string `json:"trace_id"` + ServiceName string `json:"service_name"` + EndpointName string `json:"endpoint_name"` + StartTime int64 `json:"start_time_ms"` + Duration int64 `json:"duration_ms"` + IsError bool `json:"is_error"` + SpanCount int `json:"span_count"` +} + +// TimeRange represents the time span of the traces +type TimeRange struct { + StartTime int64 `json:"start_time_ms"` + EndTime int64 `json:"end_time_ms"` + Duration int64 `json:"duration_ms"` +} + +// createBasicTraceSummary creates a BasicTraceSummary from trace item data +func createBasicTraceSummary(traceItem *api.BasicTrace, startTimeMs, duration int64, isError bool) BasicTraceSummary { + return BasicTraceSummary{ + TraceID: traceItem.TraceIds[0], // Use first trace ID + ServiceName: traceItem.SegmentID, // Use segment ID as service name + EndpointName: strings.Join(traceItem.EndpointNames, ", "), + StartTime: startTimeMs, + Duration: duration, + IsError: isError, + SpanCount: 0, // BasicTrace doesn't have span count + } +} + +// processTraceResult handles the common logic for processing trace results +func processTraceResult(traceID string, traceData *api.Trace, view string) (*mcp.CallToolResult, error) { + if len(traceData.Spans) == 0 { + return mcp.NewToolResultError(fmt.Sprintf(ErrTraceNotFound, traceID)), nil + } + + var result interface{} + switch view { + case ViewSummary: + result = generateTraceSummary(traceID, traceData) + case ViewErrorsOnly: + result = filterErrorSpans(traceData) + case ViewFull: + result = traceData + default: + return mcp.NewToolResultError(fmt.Sprintf(ErrInvalidView, view, ViewFull, ViewSummary, ViewErrorsOnly)), nil + } + + jsonBytes, err := json.Marshal(result) + if err != nil { + return mcp.NewToolResultError(fmt.Sprintf(ErrMarshalFailed, err)), nil + } + return mcp.NewToolResultText(string(jsonBytes)), nil +} + +// validateTraceRequest validates trace request parameters +func validateTraceRequest(req TraceRequest) error { + if req.TraceID == "" { + return errors.New(ErrMissingTraceID) + } + return nil +} + +// validateColdTraceRequest validates cold trace request parameters +func validateColdTraceRequest(req ColdTraceRequest) error { + if req.TraceID == "" { + return errors.New(ErrMissingTraceID) + } + if req.Duration == "" { + return errors.New(ErrMissingDuration) + } + return nil +} + +// searchTrace fetches the trace data and processes it based on the requested view +func searchTrace(ctx context.Context, req *TraceRequest) (*mcp.CallToolResult, error) { + if err := validateTraceRequest(*req); err != nil { + return mcp.NewToolResultError(err.Error()), nil + } + if req.View == "" { + req.View = ViewFull // Set default value + } + traces, err := trace.Trace(ctx, req.TraceID) if err != nil { - return nil, fmt.Errorf("search trace %v failed: %w", req.TraceID, err) + return mcp.NewToolResultError(fmt.Sprintf(ErrFailedToQueryTrace, req.TraceID, err)), nil } - return &traces, nil + traceData := &traces + + return processTraceResult(req.TraceID, traceData, req.View) } -func AddTraceTools(mcp *server.MCPServer) { - SearchTraceTool.Register(mcp) +// searchColdTrace fetches the trace data from cold storage and processes it based on the requested view +func searchColdTrace(ctx context.Context, req *ColdTraceRequest) (*mcp.CallToolResult, error) { + if err := validateColdTraceRequest(*req); err != nil { + return mcp.NewToolResultError(err.Error()), nil + } + if req.View == "" { + req.View = ViewFull // Set default value + } + + // Parse duration string to api.Duration + duration := ParseDuration(req.Duration, true) + + traces, err := trace.ColdTrace(ctx, duration, req.TraceID) + if err != nil { + return mcp.NewToolResultError(fmt.Sprintf(ErrFailedToQueryColdTrace, req.TraceID, err)), nil + } + traceData := &traces + + return processTraceResult(req.TraceID, traceData, req.View) +} + +// generateTraceSummary creates a summary view from full trace data +func generateTraceSummary(traceID string, traceData *api.Trace) *TraceSummary { + summary := &TraceSummary{ + TraceID: traceID, + TotalSpans: len(traceData.Spans), + } + services := make(map[string]struct{}) + + for _, span := range traceData.Spans { + if span == nil { + continue + } + services[span.ServiceCode] = struct{}{} + if span.IsError != nil && *span.IsError { + summary.ErrorCount++ + } + // Heuristic to find the root span: a span with spanId 0 and parentSpanId -1 + if span.SpanID == 0 && span.ParentSpanID == -1 { + if span.EndpointName != nil { + summary.RootEndpoint = *span.EndpointName + } + summary.StartTime = span.StartTime + summary.EndTime = span.EndTime + if summary.StartTime > 0 && summary.EndTime > 0 { + summary.TotalDuration = summary.EndTime - summary.StartTime + } + } + } + + summary.HasErrors = summary.ErrorCount > 0 + for service := range services { + summary.Services = append(summary.Services, service) + } + sort.Strings(summary.Services) // Ensure deterministic order + return summary +} + +// filterErrorSpans extracts only the spans with errors from full trace data +func filterErrorSpans(traceData *api.Trace) []*api.Span { + var errorSpans []*api.Span + for _, span := range traceData.Spans { + if span != nil && span.IsError != nil && *span.IsError { + errorSpans = append(errorSpans, span) + } + } + return errorSpans +} + +// validateTracesQueryRequest validates traces query request parameters +func validateTracesQueryRequest(req *TracesQueryRequest) error { + // At least one filter should be provided for meaningful results + if req.ServiceID == "" && req.ServiceInstanceID == "" && req.TraceID == "" && + req.EndpointID == "" && req.Duration == "" && req.MinTraceDuration == 0 && + req.MaxTraceDuration == 0 { + return errors.New(ErrNoFilterCondition) + } + + // Validate duration range + if req.MinTraceDuration > 0 && req.MaxTraceDuration > 0 && req.MinTraceDuration > req.MaxTraceDuration { + return fmt.Errorf(ErrInvalidDurationRange, req.MinTraceDuration, req.MaxTraceDuration) + } + + // Validate pagination + if req.PageSize < 0 { + return errors.New(ErrNegativePageSize) + } + if req.PageNum < 0 { + return errors.New(ErrNegativePageNum) + } + + return nil +} + +// setBasicFields sets basic fields in the query condition +func setBasicFields(req *TracesQueryRequest, condition *api.TraceQueryCondition) { + if req.ServiceID != "" { + condition.ServiceID = &req.ServiceID + } + if req.ServiceInstanceID != "" { + condition.ServiceInstanceID = &req.ServiceInstanceID + } + if req.TraceID != "" { + condition.TraceID = &req.TraceID + } + if req.EndpointID != "" { + condition.EndpointID = &req.EndpointID + } + if req.MinTraceDuration > 0 { + minDuration := int(req.MinTraceDuration) + condition.MinTraceDuration = &minDuration + } + if req.MaxTraceDuration > 0 { + maxDuration := int(req.MaxTraceDuration) + condition.MaxTraceDuration = &maxDuration + } +} + +// setTags sets tags in the query condition +func setTags(req *TracesQueryRequest, condition *api.TraceQueryCondition) { + if len(req.Tags) > 0 { + apiTags := make([]*api.SpanTag, len(req.Tags)) + for i, tag := range req.Tags { + apiTags[i] = &api.SpanTag{ + Key: tag.Key, + Value: &tag.Value, + } + } + condition.Tags = apiTags + } +} + +// setDuration sets duration in the query condition +func setDuration(req *TracesQueryRequest, condition *api.TraceQueryCondition) { + if req.Duration != "" { + duration := ParseDuration(req.Duration, req.Cold) + condition.QueryDuration = &duration + } else if req.TraceID == "" { + // If no duration and no traceId provided, set default duration (last 1 hour) + // SkyWalking OAP requires either queryDuration or traceId + defaultDuration := ParseDuration(DefaultTraceDuration, req.Cold) + condition.QueryDuration = &defaultDuration + } +} + +// setTraceState sets trace state in the query condition +func setTraceState(req *TracesQueryRequest, condition *api.TraceQueryCondition) error { + switch req.TraceState { + case TraceStateSuccess: + condition.TraceState = api.TraceStateSuccess + case TraceStateError: + condition.TraceState = api.TraceStateError + case TraceStateAll, "": + condition.TraceState = api.TraceStateAll + default: + return fmt.Errorf(ErrInvalidTraceState, + req.TraceState, TraceStateSuccess, TraceStateError, TraceStateAll) + } + return nil +} + +// setQueryOrder sets query order in the query condition +func setQueryOrder(req *TracesQueryRequest, condition *api.TraceQueryCondition) error { + switch req.QueryOrder { + case QueryOrderStartTime, "": + condition.QueryOrder = api.QueryOrderByStartTime + case QueryOrderDuration: + condition.QueryOrder = api.QueryOrderByDuration + default: + return fmt.Errorf(ErrInvalidQueryOrder, + req.QueryOrder, QueryOrderStartTime, QueryOrderDuration) + } + return nil } -var SearchTraceTool = NewTool[TraceRequest, *api.Trace]( - "search_trace_by_trace_id", - "Search for traces by a single TraceId", +// setPagination sets pagination in the query condition +func setPagination(req *TracesQueryRequest, condition *api.TraceQueryCondition) { + pageSize := req.PageSize + if pageSize == 0 { + pageSize = DefaultTracePageSize + } + condition.Paging = BuildPagination(req.PageNum, pageSize) +} + +// buildQueryCondition builds the query condition from request parameters +func buildQueryCondition(req *TracesQueryRequest) (*api.TraceQueryCondition, error) { + condition := &api.TraceQueryCondition{ + TraceState: api.TraceStateAll, // Default to all traces + QueryOrder: api.QueryOrderByStartTime, // Default order + } + + // Set basic fields + setBasicFields(req, condition) + + // Set tags + setTags(req, condition) + + // Set duration + setDuration(req, condition) + + // Set trace state + if err := setTraceState(req, condition); err != nil { + return nil, err + } + + // Set query order + if err := setQueryOrder(req, condition); err != nil { + return nil, err + } + + // Set pagination + setPagination(req, condition) + + return condition, nil +} + +// searchTraces fetches traces based on query conditions +func searchTraces(ctx context.Context, req *TracesQueryRequest) (*mcp.CallToolResult, error) { + if err := validateTracesQueryRequest(req); err != nil { + return mcp.NewToolResultError(err.Error()), nil + } + + // Set default view + if req.View == "" { + req.View = ViewFull // Default to full view + } + + // Build query condition + condition, err := buildQueryCondition(req) + if err != nil { + return mcp.NewToolResultError(err.Error()), nil + } + + // Execute query + traces, err := trace.Traces(ctx, condition) + if err != nil { + return mcp.NewToolResultError(fmt.Sprintf(ErrFailedToQueryTraces, err)), nil + } + + return processTracesResult(&traces, req.View, req.SlowTraceThreshold) +} + +// processTracesResult handles the common logic for processing traces query results +func processTracesResult(traces *api.TraceBrief, view string, slowTraceThreshold int64) (*mcp.CallToolResult, error) { + if traces == nil || len(traces.Traces) == 0 { + return mcp.NewToolResultError(ErrNoTracesFound), nil + } + + var result interface{} + switch view { + case ViewSummary: + result = generateTracesSummary(traces, slowTraceThreshold) + case ViewErrorsOnly: + result = filterErrorTraces(traces) + case ViewFull: + result = traces + default: + return mcp.NewToolResultError(fmt.Sprintf(ErrInvalidView, view, ViewFull, ViewSummary, ViewErrorsOnly)), nil + } + + jsonBytes, err := json.Marshal(result) + if err != nil { + return mcp.NewToolResultError(fmt.Sprintf(ErrMarshalFailed, err)), nil + } + return mcp.NewToolResultText(string(jsonBytes)), nil +} + +// processTraceItem processes a single trace item and updates summary statistics +func processTraceItem(traceItem *api.BasicTrace, summary *TracesSummary, + services, endpoints map[string]struct{}, durations *[]int64, + errorTraces, slowTraces *[]BasicTraceSummary, slowTraceThreshold int64, + minStartTime, maxEndTime *int64, totalDuration *int64) { + if traceItem == nil { + return + } + + // Parse start time + startTime, err := time.Parse(TimeFormatFull, traceItem.Start) + if err != nil { + return // Skip invalid traces + } + startTimeMs := startTime.UnixMilli() + endTimeMs := startTimeMs + int64(traceItem.Duration) + + // Track time range + if *minStartTime == 0 || startTimeMs < *minStartTime { + *minStartTime = startTimeMs + } + if endTimeMs > *maxEndTime { + *maxEndTime = endTimeMs + } + + // Calculate duration + duration := int64(traceItem.Duration) + *durations = append(*durations, duration) + *totalDuration += duration + + // Count errors + isError := traceItem.IsError != nil && *traceItem.IsError + if isError { + summary.ErrorCount++ + *errorTraces = append(*errorTraces, createBasicTraceSummary(traceItem, startTimeMs, duration, true)) + } else { + summary.SuccessCount++ + } + + // Identify slow traces only if threshold is configured + if slowTraceThreshold > 0 && duration > slowTraceThreshold { + *slowTraces = append(*slowTraces, createBasicTraceSummary(traceItem, startTimeMs, duration, isError)) + } + + // Collect services and endpoints + services[traceItem.SegmentID] = struct{}{} + for _, endpoint := range traceItem.EndpointNames { + if endpoint != "" { + endpoints[endpoint] = struct{}{} + } + } +} + +// calculateStatistics calculates summary statistics from durations +func calculateStatistics(durations []int64, totalDuration int64) (avgDuration float64, minDuration, maxDuration int64) { + if len(durations) == 0 { + return 0, 0, 0 + } + + sort.Slice(durations, func(i, j int) bool { + return durations[i] < durations[j] + }) + + avgDuration = float64(totalDuration) / float64(len(durations)) + minDuration = durations[0] + maxDuration = durations[len(durations)-1] + + return +} + +// generateTracesSummary creates a comprehensive summary from multiple traces +func generateTracesSummary(traces *api.TraceBrief, slowTraceThreshold int64) *TracesSummary { + if traces == nil || len(traces.Traces) == 0 { + return &TracesSummary{} + } + + summary := &TracesSummary{ + TotalTraces: len(traces.Traces), + } + + services := make(map[string]struct{}) + endpoints := make(map[string]struct{}) + var durations []int64 + var errorTraces []BasicTraceSummary + var slowTraces []BasicTraceSummary + + var minStartTime, maxEndTime int64 + var totalDuration int64 + + // Process each trace item + for _, traceItem := range traces.Traces { + processTraceItem(traceItem, summary, services, endpoints, &durations, + &errorTraces, &slowTraces, slowTraceThreshold, &minStartTime, &maxEndTime, &totalDuration) + } + + // Calculate statistics + summary.AvgDuration, summary.MinDuration, summary.MaxDuration = + calculateStatistics(durations, totalDuration) + + // Set time range + summary.TimeRange = TimeRange{ + StartTime: minStartTime, + EndTime: maxEndTime, + Duration: maxEndTime - minStartTime, + } + + // Convert maps to slices + for service := range services { + summary.Services = append(summary.Services, service) + } + sort.Strings(summary.Services) // Ensure deterministic order + for endpoint := range endpoints { + summary.Endpoints = append(summary.Endpoints, endpoint) + } + + // Sort error and slow traces by duration (descending) + sort.Slice(errorTraces, func(i, j int) bool { + return errorTraces[i].Duration > errorTraces[j].Duration + }) + sort.Slice(slowTraces, func(i, j int) bool { + return slowTraces[i].Duration > slowTraces[j].Duration + }) + + summary.ErrorTraces = errorTraces + summary.SlowTraces = slowTraces + + return summary +} + +// filterErrorTraces extracts only error traces from the results +func filterErrorTraces(traces *api.TraceBrief) []BasicTraceSummary { + if traces == nil { + return nil + } + + var errorTraces []BasicTraceSummary + for _, traceItem := range traces.Traces { + if traceItem != nil && traceItem.IsError != nil && *traceItem.IsError { + // Parse start time + startTime, err := time.Parse(TimeFormatFull, traceItem.Start) + if err != nil { + continue + } + startTimeMs := startTime.UnixMilli() + + errorTraces = append(errorTraces, + createBasicTraceSummary(traceItem, startTimeMs, int64(traceItem.Duration), true)) + } + } + + // Sort by duration (descending) to show slowest errors first + sort.Slice(errorTraces, func(i, j int) bool { + return errorTraces[i].Duration > errorTraces[j].Duration + }) + + return errorTraces +} + +// SearchTraceTool is a tool for searching traces by trace ID with different views +var SearchTraceTool = NewTool[TraceRequest, *mcp.CallToolResult]( + "get_trace_details", + `This tool provides detailed information about a distributed trace from SkyWalking OAP. + +Workflow: +1. Use this tool when you need to analyze a specific trace by its trace ID +2. Choose the appropriate view based on your analysis needs: + - 'full': For complete trace analysis with all spans and details + - 'summary': For quick overview and performance metrics + - 'errors_only': For troubleshooting and error investigation + +Best Practices: +- Use 'summary' view first to get an overview of the trace +- Switch to 'errors_only' if the summary shows errors +- Use 'full' view for detailed debugging and span-by-span analysis +- Trace IDs are typically found in logs, error messages, or monitoring dashboards + +Examples: +- {"trace_id": "abc123..."}: Get complete trace details for analysis +- {"trace_id": "abc123...", "view": "summary"}: Quick performance overview +- {"trace_id": "abc123...", "view": "errors_only"}: Focus on error spans only`, searchTrace, mcp.WithTitleAnnotation("Search a trace by TraceId"), mcp.WithString("trace_id", mcp.Required(), - mcp.Description("The TraceId to search for")), + mcp.Description(`The unique identifier of the trace to retrieve.`), + ), + mcp.WithString("view", + mcp.Enum(ViewFull, ViewSummary, ViewErrorsOnly), + mcp.Description(`Specifies the level of detail for trace analysis: +- 'full': (Default) Complete trace with all spans, service calls, and metadata +- 'summary': High-level overview with services, duration, and error count +- 'errors_only': Only spans marked as errors for troubleshooting`), + ), +) + +// ColdTraceTool is a tool for searching traces from cold storage by trace ID with different views +var ColdTraceTool = NewTool[ColdTraceRequest, *mcp.CallToolResult]( + "get_cold_trace_details", + `This tool queries BanyanDB cold storage for historical trace data that may no longer be available in hot storage. + +Important Notes: +- Only works with BanyanDB storage backend +- Queries older trace data that has been moved to cold storage +- May have slower response times compared to hot storage queries +- Use when trace data is not found in regular trace queries + +Duration Format: +- Standard Go duration: "7d", "1h", "-30m", "2h30m" +- Negative values mean "ago": "-7d" = 7 days ago to now +- Positive values mean "from now": "2h" = now to 2 hours later +- Legacy format: "6d", "12h" (backward compatible) + +Usage Scenarios: +- Historical incident investigation +- Long-term performance analysis +- Compliance and audit requirements +- When hot storage queries return no results + +Examples: +- {"trace_id": "abc123...", "duration": "7d"}: Search last 7 days of cold storage +- {"trace_id": "abc123...", "duration": "-30m"}: Search from 30 minutes ago to now +- {"trace_id": "abc123...", "duration": "1h", "view": "summary"}: Quick summary from last hour +- {"trace_id": "abc123...", "duration": "2h30m", "view": "errors_only"}: Error analysis from last 2.5 hours`, + searchColdTrace, + mcp.WithTitleAnnotation("Search a cold trace by TraceId"), + mcp.WithString("trace_id", mcp.Required(), + mcp.Description(`The unique identifier of the trace to retrieve from cold storage. Use this when regular trace queries return no results.`), + ), + mcp.WithString("duration", mcp.Required(), + mcp.Description(`Time duration for cold storage query. Examples: "7d" (last 7 days), "-30m" (last 30 minutes), "2h30m" (last 2.5 hours)`), + ), + mcp.WithString("view", + mcp.Enum(ViewFull, ViewSummary, ViewErrorsOnly), + mcp.Description(`Specifies the level of detail for cold trace analysis: +- 'full': (Default) Complete trace with all spans from cold storage +- 'summary': High-level overview with services, duration, and error count +- 'errors_only': Only error spans for focused troubleshooting`), + ), +) + +// TracesQueryTool is a tool for querying traces with various conditions +var TracesQueryTool = NewTool[TracesQueryRequest, *mcp.CallToolResult]( + "query_traces", + `This tool queries traces from SkyWalking OAP based on various conditions and provides intelligent data processing for LLM analysis. + +Workflow: +1. Use this tool when you need to find traces matching specific criteria +2. Specify one or more query conditions to narrow down results +3. Use duration to limit the time range for the search +4. Choose the appropriate view for your analysis needs + +Query Conditions: +- service_id: Filter by specific service +- service_instance_id: Filter by specific service instance +- trace_id: Search for a specific trace ID +- endpoint_id: Filter by specific endpoint +- duration: Time range for the query (e.g., "1h", "7d", "-30m") +- min_trace_duration/max_trace_duration: Filter by trace duration in milliseconds +- trace_state: Filter by trace state (success, error, all) +- query_order: Sort order (start_time, duration, start_time_desc, duration_desc) +- view: Data presentation format (summary, errors_only, full) +- slow_trace_threshold: Optional threshold for identifying slow traces in milliseconds +- tags: Filter by span tags (key-value pairs) + +Important Notes: +- SkyWalking OAP requires either 'duration' or 'trace_id' to be specified +- If neither is provided, a default duration of "1h" (last 1 hour) will be used +- This ensures the query always has a valid time range or specific trace to search + +View Options: +- 'full': (Default) Complete raw data for detailed analysis +- 'summary': Intelligent summary with performance metrics and insights +- 'errors_only': Focused list of error traces for troubleshooting + +Best Practices: +- Start with 'summary' view to get an intelligent overview +- Use 'errors_only' view for focused troubleshooting +- Combine multiple filters for precise results +- Use duration to limit search scope and improve performance +- Only set slow_trace_threshold when you need to identify performance issues +- Use tags to filter traces by specific attributes or metadata + +Examples: +- {"service_id": "Your_ApplicationName", "duration": "1h", "view": "summary"}: Recent traces summary with performance insights +- {"trace_state": "error", "duration": "7d", "view": "errors_only"}: Error traces from last week for troubleshooting +- {"min_trace_duration": 1000, "query_order": "duration_desc", "view": "summary"}: Slow traces analysis with performance metrics +- {"slow_trace_threshold": 5000, "view": "summary"}: Identify traces slower than 5 seconds +- {"service_id": "Your_ApplicationName"}: Query with default 1-hour duration +- {"tags": [{"key": "http.method", "value": "POST"}, {"key": "http.status_code", "value": "500"}], + "duration": "1h"}: Find traces with specific HTTP tags`, + searchTraces, + mcp.WithTitleAnnotation("Query traces with intelligent analysis"), + mcp.WithString("service_id", + mcp.Description("Service ID to filter traces. Use this to find traces from a specific service."), + ), + mcp.WithString("service_instance_id", + mcp.Description("Service instance ID to filter traces. Use this to find traces from a specific instance."), + ), + mcp.WithString("trace_id", + mcp.Description("Specific trace ID to search for. Use this when you know the exact trace ID."), + ), + mcp.WithString("endpoint_id", + mcp.Description("Endpoint ID to filter traces. Use this to find traces for a specific endpoint."), + ), + mcp.WithString("duration", + mcp.Description(`Time duration for the query. Examples: "7d" (last 7 days), "-30m" (last 30 minutes), "2h30m" (last 2.5 hours)`), + ), + mcp.WithNumber("min_trace_duration", + mcp.Description("Minimum trace duration in milliseconds. Use this to filter out fast traces."), + ), + mcp.WithNumber("max_trace_duration", + mcp.Description("Maximum trace duration in milliseconds. Use this to filter out slow traces."), + ), + mcp.WithString("trace_state", + mcp.Enum(TraceStateSuccess, TraceStateError, TraceStateAll), + mcp.Description(`Filter traces by their state: +- 'success': Only successful traces +- 'error': Only traces with errors +- 'all': All traces (default)`), + ), + mcp.WithString("query_order", + mcp.Enum(QueryOrderStartTime, QueryOrderDuration), + mcp.Description(`Sort order for results: +- 'start_time': Oldest first +- 'duration': Shortest first`), + ), + mcp.WithString("view", + mcp.Enum(ViewSummary, ViewErrorsOnly, ViewFull), + mcp.Description(`Data presentation format: +- 'full': (Default) Complete raw data for detailed analysis +- 'summary': Intelligent summary with performance metrics and insights +- 'errors_only': Focused list of error traces for troubleshooting`), + ), + mcp.WithNumber("slow_trace_threshold", + mcp.Description("Optional threshold for identifying slow traces in milliseconds. "+ + "Only when this parameter is set will slow traces be included in the summary. "+ + "Traces with duration exceeding this threshold will be listed in slow_traces. "+ + "Examples: 500 (0.5s), 2000 (2s), 5000 (5s)"), + ), + mcp.WithArray("tags", + mcp.Description(`Array of span tags to filter traces. Each tag should have 'key' and 'value' fields. +Examples: [{"key": "http.method", "value": "POST"}, {"key": "http.status_code", "value": "500"}]`), + ), + mcp.WithBoolean("cold", + mcp.Description("Whether to query from cold-stage storage. Set to true for historical data queries."), + ), )