This is an automated email from the ASF dual-hosted git repository. wusheng pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/skywalking-banyandb.git
The following commit(s) were added to refs/heads/main by this push: new 4d5813fd - Add the TRACE Model to BydbQL and support distributed tracing with `WITH QUERY_TRACE` (#760) 4d5813fd is described below commit 4d5813fddda949f0d76722a58ab0cb8ae4e5cfae Author: Gao Hongtao <hanahm...@gmail.com> AuthorDate: Tue Sep 9 09:46:51 2025 +0800 - Add the TRACE Model to BydbQL and support distributed tracing with `WITH QUERY_TRACE` (#760) - --- docs/interacting/bydbql.md | 248 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 233 insertions(+), 15 deletions(-) diff --git a/docs/interacting/bydbql.md b/docs/interacting/bydbql.md index 9415ff72..98a8ae9c 100644 --- a/docs/interacting/bydbql.md +++ b/docs/interacting/bydbql.md @@ -151,7 +151,7 @@ BydbQL for streams is designed for querying and retrieving raw time-series eleme ### 4.1. Grammar ``` -query ::= SELECT projection [from_stream_clause] [TIME time_condition] [WHERE criteria] [ORDER BY order_expression] [LIMIT integer] [OFFSET integer] +query ::= SELECT projection [from_stream_clause] [TIME time_condition] [WHERE criteria] [ORDER BY order_expression] [LIMIT integer] [OFFSET integer] [WITH QUERY_TRACE] from_stream_clause ::= "FROM STREAM" identifier ["IN" "(" group_list ")"] projection ::= "*" | column_list column_list ::= identifier ("," identifier)* @@ -184,6 +184,7 @@ integer_literal ::= [0-9]+ * **`WHERE conditions`**: Maps to `criteria`. * **`ORDER BY field`**: Maps to `order_by`. * **`LIMIT`/`OFFSET`**: Maps to `limit` and `offset`. +* **`WITH QUERY_TRACE`**: Maps to the `trace` field to enable distributed tracing of query execution. ### 4.3. Examples @@ -251,6 +252,13 @@ WHERE status = 'error'; SELECT trace_id, duration FROM STREAM sw TIME < '-1d'; + +-- Query with distributed tracing enabled +SELECT trace_id, service_id, start_time +FROM STREAM sw +TIME BETWEEN '-2h' AND 'now' +WHERE service_id = 'webapp' +WITH QUERY_TRACE; ``` ## 5. BydbQL for Measures @@ -260,7 +268,7 @@ BydbQL for measures is tailored for analytical queries on aggregated numerical d ### 5.1. Grammar ``` -measure_query ::= SELECT projection [from_measure_clause] [TIME time_condition] [WHERE criteria] [GROUP BY group_list] [ORDER BY order_expression] [LIMIT integer] [OFFSET integer] +measure_query ::= SELECT projection [from_measure_clause] [TIME time_condition] [WHERE criteria] [GROUP BY group_list] [ORDER BY order_expression] [LIMIT integer] [OFFSET integer] [WITH QUERY_TRACE] from_measure_clause ::= "FROM MEASURE" identifier ["IN" "(" group_list ")"] projection ::= "*" | (column_list | agg_function "(" identifier ")" | "TOP" integer projection) column_list ::= identifier ("," identifier)* ["::tag" | "::field"] @@ -302,6 +310,7 @@ The `SELECT` clause for measures is highly flexible, allowing for the selection * **`TIME BETWEEN '-1h' AND 'now'`**: Maps to `time_range` from 1 hour ago to current time. * **`GROUP BY <tag1>, <tag2>`**: The `GROUP BY` clause takes a simple list of tags and maps to `group_by.tag_projection`. * **`SELECT TOP N ...`**: Maps to the `top` message. +* **`WITH QUERY_TRACE`**: Maps to the `trace` field to enable distributed tracing of query execution. ### 5.4. Examples @@ -400,6 +409,16 @@ FROM MEASURE api_errors TIME < '-1d' WHERE status_code = '500' GROUP BY endpoint; + +-- Query with distributed tracing enabled +SELECT + service, + AVG(response_time) +FROM MEASURE http_metrics +TIME > '-30m' +WHERE region = 'us-west' +GROUP BY service +WITH QUERY_TRACE; ``` ## 6. BydbQL for Top-N @@ -409,7 +428,7 @@ Top-N queries use a specialized, command-like syntax for clarity and to reflect ### 6.1. Grammar ``` -topn_query ::= SHOW TOP integer from_measure_clause [TIME time_condition] [WHERE topn_criteria] [AGGREGATE BY agg_function] [ORDER BY value ["ASC"|"DESC"]] +topn_query ::= SHOW TOP integer from_measure_clause [TIME time_condition] [WHERE topn_criteria] [AGGREGATE BY agg_function] [ORDER BY value ["ASC"|"DESC"]] [WITH QUERY_TRACE] from_measure_clause ::= "FROM MEASURE" identifier ["IN" "(" group_list ")"] topn_criteria ::= condition (("AND" | "OR") condition)* condition ::= identifier binary_op (value | value_list) @@ -438,6 +457,7 @@ integer_literal ::= [0-9]+ * **`WHERE tag = 'value'`**: Maps to `conditions`. Only simple equality is supported. * **`AGGREGATE BY FUNC`**: Maps to `agg`. * **`ORDER BY value DESC`**: Maps to `field_value_sort`. +* **`WITH QUERY_TRACE`**: Maps to the `trace` field to enable distributed tracing of query execution. ### 6.3. Examples @@ -495,6 +515,14 @@ SHOW TOP 3 FROM MEASURE service_response_time TIME > '-30m' ORDER BY value ASC; + +-- Top-N query with distributed tracing enabled +SHOW TOP 10 +FROM MEASURE service_latency +TIME > '-1h' +WHERE http_method = 'GET' +ORDER BY value DESC +WITH QUERY_TRACE; ``` ## 7. BydbQL for Properties @@ -504,7 +532,7 @@ BydbQL for properties is designed for simple key-value lookups and metadata filt ### 7.1. Grammar ``` -property_query ::= SELECT projection [from_property_clause] [WHERE criteria] [LIMIT integer] +property_query ::= SELECT projection [from_property_clause] [WHERE criteria] [LIMIT integer] [WITH QUERY_TRACE] from_property_clause ::= "FROM PROPERTY" identifier ["IN" "(" group_list ")"] projection ::= "*" | column_list column_list ::= identifier ("," identifier)* @@ -526,6 +554,7 @@ integer_literal ::= [0-9]+ * **`WHERE ID IN (...)`**: Maps to `ids`. * **`WHERE tag = 'value'`**: Maps to `criteria`. * **`LIMIT n`**: Maps to `limit`. +* **`WITH QUERY_TRACE`**: Maps to the `trace` field to enable distributed tracing of query execution. ### 7.3. Examples @@ -553,16 +582,205 @@ SELECT ip, owner FROM PROPERTY server_metadata IN (datacenter-1, datacenter-2, datacenter-3) WHERE in_service = 'true' LIMIT 100; + +-- Property query with distributed tracing enabled +SELECT ip, region, owner +FROM PROPERTY server_metadata +WHERE datacenter = 'dc-101' AND in_service = 'true' +LIMIT 50 +WITH QUERY_TRACE; +``` + +## 8. BydbQL for Traces + +BydbQL for traces is designed for querying and retrieving trace data with spans. The syntax maps to the `banyandb.trace.v1.QueryRequest` message and is optimized for trace-specific operations. + +### 8.1. Grammar + +``` +trace_query ::= SELECT projection [from_trace_clause] [TIME time_condition] [WHERE criteria] [ORDER BY order_expression] [LIMIT integer] [OFFSET integer] [WITH QUERY_TRACE] +from_trace_clause ::= "FROM TRACE" identifier ["IN" "(" group_list ")"] +projection ::= "*" | column_list | "()" +column_list ::= identifier ("," identifier)* +group_list ::= identifier ("," identifier)* +criteria ::= condition (("AND" | "OR") condition)* +condition ::= identifier binary_op (value | value_list) +time_condition ::= "=" timestamp | ">" timestamp | "<" timestamp | ">=" timestamp | "<=" timestamp | "BETWEEN" timestamp "AND" timestamp +binary_op ::= "=" | "!=" | ">" | "<" | ">=" | "<=" | "IN" | "NOT IN" | "HAVING" | "NOT HAVING" | "MATCH" +order_expression ::= identifier ["ASC" | "DESC"] +value ::= string_literal | integer_literal | "NULL" +value_list ::= "(" value ("," value)* ")" +timestamp ::= string_literal | integer_literal + /* timestamp supports both absolute and relative time formats: + - Absolute: RFC3339 format like "2006-01-02T15:04:05Z07:00" + - Relative: duration strings like "-30m", "2h", "1d" (relative to current time) */ +identifier ::= [a-zA-Z_][a-zA-Z0-9_]* +string_literal ::= "'" [^']* "'" | "\"" [^\"]* "\"" +integer_literal ::= [0-9]+ +``` + +### 8.2. Trace Model Characteristics + +The Trace model in BanyanDB is specifically designed for storing and querying trace data with the following key characteristics: + +* **Trace Resource**: A logical namespace within a group that defines the structure for trace data +* **Tags**: Indexed tags that support filtering and querying (defined by `TraceTagSpec`) +* **Trace ID Tag**: A specific tag that stores the trace ID for trace correlation +* **Timestamp Tag**: A specific tag that stores the timestamp for time-based queries +* **Span Data**: Raw span data stored as binary for efficient storage and retrieval + +### 8.2.1. Empty Projection Support + +BydbQL for traces supports an empty projection syntax `SELECT ()` that allows queries to return only raw span data without any tag information. This is useful for: + +* **Performance Optimization**: When you only need the raw span data and don't want to pay the cost of loading and returning tag values +* **Data Processing**: When you plan to process the raw span data externally and don't need the indexed tag values +* **Storage Efficiency**: Reducing network transfer and memory usage by excluding tag data + +**Syntax**: `SELECT ()` - Returns only raw span data, no tag information +**Behavior**: The query will still apply filtering and ordering based on the WHERE and ORDER BY clauses, but the result will contain only the binary span data without any tag projections. + +### 8.3. Mapping to `trace.v1.QueryRequest` + +* **`FROM TRACE name IN (groups)`**: Maps to the `name` and `groups` fields. If the clause is omitted, these values are taken from the execution context. +* **`SELECT tags`**: Maps to `tag_projection`. Requires a trace schema to resolve tags to their specifications. +* **`SELECT ()`**: Maps to an empty `tag_projection` array, indicating no tag data should be returned (only raw span data). +* **`TIME = '2023-01-01T00:00:00Z'`**: Maps to `time_range` with `begin` and `end` set to the same timestamp. +* **`TIME > '2023-01-01T00:00:00Z'`**: Maps to `time_range` with `begin` set to the timestamp. +* **`TIME BETWEEN '2023-01-01T00:00:00Z' AND '2023-01-02T00:00:00Z'`**: Maps to `time_range` with `begin` and `end` set to the respective timestamps. +* **`TIME > '-30m'`**: Maps to `time_range` with `begin` set to 30 minutes ago. +* **`TIME BETWEEN '-1h' AND 'now'`**: Maps to `time_range` from 1 hour ago to current time. +* **`WHERE conditions`**: Maps to `criteria` for filtering spans based on tag values. +* **`ORDER BY field`**: Maps to `order_by` for sorting results. +* **`LIMIT`/`OFFSET`**: Maps to `limit` and `offset` for pagination. +* **`WITH QUERY_TRACE`**: Maps to the `trace` field to enable distributed tracing of query execution. + +### 8.3.1. Naming Convention Clarification + +To avoid confusion between different uses of the word "trace" in BanyanDB: + +* **Trace Model**: Refers to the data model for storing trace data (spans, tags, etc.) - used in `FROM TRACE` clauses +* **Query Tracing**: Refers to distributed tracing of query execution for observability - enabled with `WITH QUERY_TRACE` + +The `WITH QUERY_TRACE` clause enables distributed tracing of the query execution itself, which is separate from querying trace data. When enabled, the query response will include execution trace information in the `trace_query_result` field. + +### 8.4. Examples + +```sql +-- Simplest query (context must provide the trace name, e.g., 'sw_trace') +SELECT *; + +-- Basic selection with filtering and ordering +SELECT trace_id, service_id, operation_name +FROM TRACE sw_trace IN (default, updated) +WHERE service_id = 'webapp' AND status = 'success' +ORDER BY start_time DESC +LIMIT 100; + +-- Project all tags from a trace (FROM is explicit) +SELECT * +FROM TRACE sw_trace +WHERE status = 'error' +LIMIT 10; + +-- Query with no tag projection - returns only raw span data +SELECT () +FROM TRACE sw_trace +WHERE service_id = 'webapp' +LIMIT 100; + +-- Use more complex conditions with IN and OR +SELECT trace_id, duration, operation_name +WHERE service_id IN ('webapp', 'api-gateway') OR http.method = 'POST'; + +-- Query with time range using TIME clause +SELECT trace_id, service_id, start_time, operation_name +FROM TRACE sw_trace +TIME BETWEEN '2023-01-01T00:00:00Z' AND '2023-01-02T00:00:00Z' +WHERE service_id = 'webapp' +ORDER BY start_time DESC; + +-- Query data after a specific timestamp +SELECT * +FROM TRACE sw_trace +TIME > '2023-01-01T12:00:00Z' +WHERE status = 'success' +LIMIT 100; + +-- Query data at an exact timestamp +SELECT trace_id, duration, operation_name +FROM TRACE sw_trace +TIME = '2023-01-01T15:30:00Z'; + +-- Query with relative time - last 30 minutes +SELECT * +FROM TRACE sw_trace +TIME > '-30m' +WHERE status = 'error' +LIMIT 100; + +-- Query with relative time range - last 2 hours +SELECT trace_id, service_id, start_time, operation_name +FROM TRACE sw_trace +TIME BETWEEN '-2h' AND 'now' +WHERE service_id = 'webapp' +ORDER BY start_time DESC; + +-- Query traces by specific trace ID +SELECT * +FROM TRACE sw_trace +WHERE trace_id = '1a2b3c4d5e6f7890'; + +-- Query traces with specific operation names +SELECT trace_id, service_id, duration +FROM TRACE sw_trace +WHERE operation_name IN ('GET /api/users', 'POST /api/orders') +ORDER BY duration DESC; + +-- Query traces with error status in the last hour +SELECT trace_id, service_id, error_message +FROM TRACE sw_trace +TIME > '-1h' +WHERE status = 'error' +ORDER BY start_time DESC +LIMIT 50; + +-- Query traces older than 1 day ago +SELECT trace_id, duration, operation_name +FROM TRACE sw_trace +TIME < '-1d' +WHERE service_id = 'legacy-service'; + +-- Query with no projection for raw span data only +SELECT () +FROM TRACE sw_trace +TIME > '-1h' +WHERE status = 'error' +LIMIT 50; + +-- Query with distributed tracing enabled for observability +SELECT trace_id, service_id, operation_name +FROM TRACE sw_trace +WHERE service_id = 'webapp' +WITH QUERY_TRACE; + +-- Query with both empty projection and query tracing +SELECT () +FROM TRACE sw_trace +TIME > '-30m' +WHERE status = 'error' +WITH QUERY_TRACE +LIMIT 100; ``` -## 8. Summary of BydbQL Capabilities - -| Feature | Streams | Measures | Top-N | Properties | -| :------------------ | :------------------------------------ | :------------------------------------ | :---------------------------------- | :---------------------------------- | -| **Primary Command** | `SELECT ... [FROM STREAM]` | `SELECT ... [FROM MEASURE]` | `SHOW TOPN ... FROM MEASURE` | `SELECT ... [FROM PROPERTY]` | -| **Projection** | Tags by family | Tags & Fields | Implicit (entity, value) | Flat list of tags | -| **Aggregation** | No | Yes (`SUM`, `MEAN`, etc.) | Yes (optional) | No | -| **Grouping** | No | Yes (`GROUP BY`) | No | No | -| **Filtering** | Full `WHERE` clause + `TIME` clause | Full `WHERE` clause + `TIME` clause | Simple equality `WHERE` + `TIME` clause | `WHERE` by ID or tags | -| **Ordering** | Yes (`ORDER BY`) | Yes (`ORDER BY`) | Yes (`ORDER BY value`) | No | -| **Pagination** | Yes (`LIMIT`/`OFFSET`) | Yes (`LIMIT`/`OFFSET`) | No | `LIMIT` only | +## 9. Summary of BydbQL Capabilities + +| Feature | Streams | Measures | Top-N | Properties | Traces | +| :------------------ | :------------------------------------ | :------------------------------------ | :---------------------------------- | :---------------------------------- | :---------------------------------- | +| **Primary Command** | `SELECT ... [FROM STREAM]` | `SELECT ... [FROM MEASURE]` | `SHOW TOPN ... FROM MEASURE` | `SELECT ... [FROM PROPERTY]` | `SELECT ... [FROM TRACE]` | +| **Projection** | Tags by family | Tags & Fields | Implicit (entity, value) | Flat list of tags | Tags by specification | +| **Aggregation** | No | Yes (`SUM`, `MEAN`, etc.) | Yes (optional) | No | No | +| **Grouping** | No | Yes (`GROUP BY`) | No | No | No | +| **Filtering** | Full `WHERE` clause + `TIME` clause | Full `WHERE` clause + `TIME` clause | Simple equality `WHERE` + `TIME` clause | `WHERE` by ID or tags | Full `WHERE` clause + `TIME` clause | +| **Ordering** | Yes (`ORDER BY`) | Yes (`ORDER BY`) | Yes (`ORDER BY value`) | No | Yes (`ORDER BY`) | +| **Pagination** | Yes (`LIMIT`/`OFFSET`) | Yes (`LIMIT`/`OFFSET`) | No | `LIMIT` only | Yes (`LIMIT`/`OFFSET`) |