This is an automated email from the ASF dual-hosted git repository.

hanahmily pushed a commit to branch dbsl
in repository https://gitbox.apache.org/repos/asf/skywalking-banyandb.git

commit 69404b6331abd734230d8f61b5e203b7081077cc
Author: Gao Hongtao <hanahm...@gmail.com>
AuthorDate: Tue Aug 26 01:11:50 2025 +0000

    Add BDSL documentation for BanyanDB Query Language
    
    This commit introduces a comprehensive document outlining the BanyanDB Data 
Query Language (BDSL), detailing its purpose, core concepts, and syntax for 
querying various data models including Streams, Measures, Properties, and Top-N 
queries. The document includes grammar specifications, mapping to request 
messages, and practical examples for each query type, enhancing user 
understanding and interaction with BanyanDB's data models.
---
 docs/concept/dbsl.md | 263 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 263 insertions(+)

diff --git a/docs/concept/dbsl.md b/docs/concept/dbsl.md
new file mode 100644
index 00000000..aa4c5cd4
--- /dev/null
+++ b/docs/concept/dbsl.md
@@ -0,0 +1,263 @@
+# BanyanDB Query Language (BDSL)
+
+## 1. Introduction
+
+### 1.1. Purpose
+
+This document outlines the design of the BanyanDB Data Query Language (BDSL), 
a unified query language with SQL-like syntax for all of BanyanDB's data 
models. The primary goal of BDSL is to provide users with a familiar, 
intuitive, and powerful interface to interact with their data, abstracting the 
underlying protobuf-based APIs for streams, measures, properties, and Top-N 
queries.
+
+This document specifies the language's syntax, its semantics, and the precise 
mapping from BDSL statements to the various `Request` messages.
+
+### 1.2. Scope
+
+BDSL supports querying across BanyanDB's primary data models:
+*   **Streams**: For raw, time-series elements like logs and traces.
+*   **Measures**: For aggregated numerical time-series data (metrics).
+*   **Properties**: For metadata and key-value information.
+
+It also provides a specialized syntax for optimized **Top-N** queries against 
measures.
+
+## 2. Core Concepts
+
+### 2.1. Parser Architecture
+
+BDSL queries will be processed by a classic three-stage compiler front-end 
architecture, which ensures modularity and maintainability.
+
+```mermaid
+graph TD
+    A[BDSL Query String] --> B{Lexer};
+    B --> C{Parser};
+    C --> D[Abstract Syntax Tree (AST)];
+    D --> E{Transformer};
+    E --> F[Protobuf Request];
+
+    subgraph Parsing Pipeline
+        B;
+        C;
+        D;
+        E;
+    end
+```
+*   **Lexer**: Breaks the query string into a sequence of tokens.
+*   **Parser**: Builds an Abstract Syntax Tree (AST) from the tokens, 
validating the query's syntax.
+*   **Transformer**: Traverses the AST, performs semantic analysis using a 
schema, and transforms the AST into the appropriate target protobuf `Request` 
message.
+
+### 2.2. Distinguishing Query Types
+
+BDSL distinguishes the target data model either explicitly through keywords in 
the `FROM` clause or implicitly through the **execution context**. This allows 
the parser to apply the correct grammar and transformation rules for the query.
+
+*   **Explicit**: `FROM STREAM <name>`, `FROM MEASURE <name>`, `FROM PROPERTY 
<name>`
+*   **Implicit**: The query is executed against a resource-specific endpoint 
(e.g., `/v1/streams/{stream_name}/query`), where the type and name of the 
resource are known.
+
+Specialized queries, like Top-N, use a distinct top-level command and 
typically require an explicit `FROM` clause:
+
+*   **Top-N**: `SHOW TOPN ... FROM MEASURE <measure_name>`
+
+### 2.3. Optional `FROM` Clause
+
+In BDSL, the `FROM` clause is **optional** for `SELECT` queries. When it is 
omitted, the target resource (the specific stream, measure, or property) 
**must** be supplied by the execution context.
+
+The simplest possible BDSL query is `SELECT *`. When executed within the 
context of a stream named `sw`, this is equivalent to `SELECT * FROM STREAM sw`.
+
+When the `FROM` clause is present, it overrides any context provided by the 
environment. This is useful for clients that connect to a generic query 
endpoint and need to specify the target resource directly within the query text.
+
+## 3. BDSL for Streams
+
+BDSL for streams is designed for querying and retrieving raw time-series 
elements. The syntax maps to the `banyandb.stream.v1.QueryRequest` message.
+
+### 3.1. Grammar
+
+```
+query           ::= SELECT projection [from_stream_clause] [WHERE criteria] 
[ORDER BY order_expression] [LIMIT integer] [OFFSET integer]
+from_stream_clause ::= "FROM STREAM" identifier ["IN" "(" group_list ")"]
+projection      ::= "*" | column_list
+column_list     ::= identifier ("," identifier)*
+group_list      ::= identifier ("," identifier)*
+criteria        ::= condition (("AND" | "OR") condition)*
+condition       ::= identifier binary_op (value | value_list)
+binary_op       ::= "=" | "!=" | ">" | "<" | ">=" | "<=" | "IN" | "NOT IN" | 
"HAVING" | "NOT HAVING" | "MATCH"
+order_expression::= identifier ["ASC" | "DESC"]
+value           ::= string_literal | integer_literal | "NULL"
+value_list      ::= "(" value ("," value)* ")"
+```
+
+### 3.2. Mapping to `stream.v1.QueryRequest`
+
+*   **`FROM STREAM name IN (groups)`**: Maps to the `name` and `groups` 
fields. If the clause is omitted, these values are taken from the execution 
context.
+*   **`SELECT tags`**: Maps to `projection`. Requires a stream schema to 
resolve tags to their families.
+*   **`WHERE conditions`**: Maps to `criteria`.
+*   **`ORDER BY field`**: Maps to `order_by`.
+*   **`LIMIT`/`OFFSET`**: Maps to `limit` and `offset`.
+
+### 3.3. Examples
+
+```sql
+-- Simplest query (context must provide the stream name, e.g., 'sw')
+SELECT *;
+
+-- Basic selection with filtering and ordering
+SELECT trace_id, service_id
+FROM STREAM sw IN (default, updated)
+WHERE service_id = 'webapp' AND state = 1
+ORDER BY start_time DESC
+LIMIT 100;
+
+-- Project all tags from a stream (FROM is explicit)
+SELECT * 
+FROM STREAM sw 
+WHERE state = 0 
+LIMIT 10;
+
+-- Use more complex conditions with IN and OR
+SELECT trace_id, duration
+WHERE service_id IN ('webapp', 'api-gateway') OR http.method = 'POST';
+```
+
+## 4. BDSL for Measures
+
+BDSL for measures is tailored for analytical queries on aggregated numerical 
data. It supports aggregation, grouping, and mixed selection of tags and 
fields, mapping to the `banyandb.measure.v1.QueryRequest` message.
+
+### 4.1. BDSL Extensions for `SELECT`
+
+The `SELECT` clause for measures is highly flexible, allowing for the 
selection of tags, fields, and aggregations in a single, flat list.
+
+*   `SELECT <field_key>, <tag_key>`: Returns specific fields and tags. The 
parser will infer the type of each identifier from the measure's schema.
+*   `SELECT <identifier>::field, <identifier>::tag`: If a field and a tag 
share the same name, the `::field` or `::tag` syntax **must** be used to 
disambiguate the identifier's type.
+*   The clause also supports aggregation functions (`SUM`, `MEAN`, `COUNT`, 
`MAX`, `MIN`) and a `TOP N` clause for ranked results.
+
+### 4.2. Mapping to `measure.v1.QueryRequest`
+
+*   **`FROM MEASURE name`**: Maps to the `name` field. If omitted, this value 
is taken from the execution context.
+*   **`SELECT <tag1>, <field1>, <field2>`**: The transformer inspects each 
identifier. Those identified as tags (either by schema lookup or `::tag`) are 
added to `tag_projection`. Those identified as fields (by schema lookup or 
`::field`) are added to `field_projection`.
+*   **`SELECT SUM(field)`**: Maps to `agg`.
+*   **`GROUP BY <tag1>, <tag2>`**: The `GROUP BY` clause takes a simple list 
of tags and maps to `group_by.tag_projection`.
+*   **`SELECT TOP N ...`**: Maps to the `top` message.
+
+### 4.3. Examples
+
+```sql
+-- Simplest query (context must provide measure name)
+SELECT *;
+
+-- Select a specific tag and a specific field
+SELECT
+    instance,
+    latency
+WHERE region = 'us-west-1'
+LIMIT 10;
+
+-- Select multiple tags and fields, with an aggregation
+SELECT
+    region,
+    SUM(latency)
+FROM MEASURE service_cpm
+GROUP BY region;
+
+-- Disambiguate a key named 'status' that exists as both a tag and a field
+SELECT
+    status::tag,
+    status::field
+FROM MEASURE http_requests
+WHERE path = '/api/v1/users';
+
+-- Find the top 10 instances with the highest CPU usage for a specific service
+SELECT TOP 10
+    instance,
+    cpu_usage
+FROM MEASURE instance_metrics
+WHERE service = 'auth-service'
+ORDER BY cpu_usage DESC;
+```
+
+## 5. BDSL for Top-N
+
+Top-N queries use a specialized, command-like syntax for clarity and to 
reflect the optimized nature of the underlying 
`banyandb.measure.v1.TopNRequest`. The `FROM` clause is mandatory.
+
+### 5.1. Grammar
+
+```
+topn_query ::= SHOW TOP integer from_measure_clause [WHERE topn_criteria] 
[AGGREGATE BY agg_function] [ORDER BY value [ASC|DESC]]
+from_measure_clause ::= "FROM MEASURE" identifier
+```
+
+### 5.2. Mapping to `measure.v1.TopNRequest`
+
+*   **`SHOW TOP N`**: Maps to `top_n`.
+*   **`WHERE tag = 'value'`**: Maps to `conditions`. Only simple equality is 
supported.
+*   **`AGGREGATE BY FUNC`**: Maps to `agg`.
+*   **`ORDER BY value DESC`**: Maps to `field_value_sort`.
+
+### 5.3. Examples
+
+```sql
+-- Get the Top 10 services with the highest latency
+SHOW TOP 10
+FROM MEASURE service_latency
+WHERE http_method = 'GET' AND version = 'v1.2.0'
+ORDER BY value DESC;
+
+-- Get the Bottom 5 services with the fewest errors
+SHOW TOP 5 
+FROM MEASURE service_errors_total
+ORDER BY value ASC;
+
+-- Get the Top 3 pods with the most restarts in total over the time range
+SHOW TOP 3 
+FROM MEASURE pod_restarts
+WHERE namespace = 'production'
+AGGREGATE BY SUM;
+```
+
+## 6. BDSL for Properties
+
+BDSL for properties is designed for simple key-value lookups and metadata 
filtering. It maps to the `banyandb.property.v1.QueryRequest` message.
+
+### 6.1. Grammar
+
+```
+property_query ::= SELECT projection [from_property_clause] [WHERE criteria] 
[LIMIT integer]
+from_property_clause ::= "FROM PROPERTY" identifier
+projection ::= "*" | column_list
+```
+
+### 6.2. Mapping to `property.v1.QueryRequest`
+
+*   **`FROM PROPERTY name`**: Maps to the `name` field. If omitted, this value 
is taken from the execution context.
+*   **`SELECT tags`**: Maps to `tag_projection`.
+*   **`WHERE ID IN (...)`**: Maps to `ids`.
+*   **`WHERE tag = 'value'`**: Maps to `criteria`.
+*   **`LIMIT n`**: Maps to `limit`.
+
+### 6.3. Examples
+
+```sql
+-- Simplest query (context must provide property name)
+SELECT *;
+
+-- Find properties by filtering on their tags
+SELECT ip, owner
+FROM PROPERTY server_metadata
+WHERE datacenter = 'dc-101' AND in_service = 'true'
+LIMIT 50;
+
+-- Retrieve a specific property by its unique ID
+SELECT * 
+WHERE ID = 'server-1a2b3c';
+
+-- Retrieve a set of properties by their unique IDs
+SELECT ip, region 
+FROM PROPERTY server_metadata 
+WHERE ID IN ('server-1a2b3c', 'server-4d5e6f');
+```
+
+## 7. Summary of BDSL Capabilities
+
+| Feature             | Streams                               | Measures       
                       | Top-N                               | Properties       
                   |
+| :------------------ | :------------------------------------ | 
:------------------------------------ | :---------------------------------- | 
:---------------------------------- |
+| **Primary Command** | `SELECT ... [FROM STREAM]`            | `SELECT ... 
[FROM MEASURE]`           | `SHOW TOPN ... FROM MEASURE`        | `SELECT ... 
[FROM PROPERTY]`        |
+| **Projection**      | Tags by family                        | Tags & Fields  
                       | Implicit (entity, value)            | Flat list of 
tags                   |
+| **Aggregation**     | No                                    | Yes (`SUM`, 
`MEAN`, etc.)             | Yes (optional)                      | No            
                      |
+| **Grouping**        | No                                    | Yes (`GROUP 
BY`)                      | No                                  | No            
                      |
+| **Filtering**       | Full `WHERE` clause                   | Full `WHERE` 
clause                   | Simple equality `WHERE`             | `WHERE` by ID 
or tags               |
+| **Ordering**        | Yes (`ORDER BY`)                      | Yes (`ORDER 
BY`)                      | Yes (`ORDER BY value`)              | No            
                      |
+| **Pagination**      | Yes (`LIMIT`/`OFFSET`)                | Yes 
(`LIMIT`/`OFFSET`)                | No                                  | 
`LIMIT` only                        |

Reply via email to