This is an automated email from the ASF dual-hosted git repository. hanahmily pushed a commit to branch dbsl in repository https://gitbox.apache.org/repos/asf/skywalking-banyandb.git
commit 9fc57788d994a3dc507c05025224ad11ecd99dcb Author: Gao Hongtao <hanahm...@gmail.com> AuthorDate: Wed Aug 27 00:51:35 2025 +0000 Update BDSL documentation with grammar specifications and case sensitivity rules This commit enhances the BDSL documentation by adding detailed grammar specifications for queries related to measures, properties, and Top-N queries. It also introduces a new section on case sensitivity, clarifying the treatment of reserved words and identifiers, along with practical examples to illustrate these concepts. Additionally, minor formatting adjustments were made for improved readability. --- .gitignore | 1 + docs/concept/dbsl.md | 145 +++++++++++++++++++++++++++++++++++++++++---------- 2 files changed, 119 insertions(+), 27 deletions(-) diff --git a/.gitignore b/.gitignore index 474f67fe..75dae5b0 100644 --- a/.gitignore +++ b/.gitignore @@ -72,4 +72,5 @@ gomock_reflect* # Claude Claude.md .claude/ +CLAUDE.md diff --git a/docs/concept/dbsl.md b/docs/concept/dbsl.md index aa4c5cd4..7a0370e3 100644 --- a/docs/concept/dbsl.md +++ b/docs/concept/dbsl.md @@ -23,20 +23,18 @@ It also provides a specialized syntax for optimized **Top-N** queries against me BDSL queries will be processed by a classic three-stage compiler front-end architecture, which ensures modularity and maintainability. -```mermaid -graph TD - A[BDSL Query String] --> B{Lexer}; - B --> C{Parser}; - C --> D[Abstract Syntax Tree (AST)]; - D --> E{Transformer}; - E --> F[Protobuf Request]; - - subgraph Parsing Pipeline - B; - C; - D; - E; - end +``` +BDSL Query String + ↓ + Lexer + ↓ + Parser + ↓ +Abstract Syntax Tree (AST) + ↓ + Transformer + ↓ + Protobuf Request ``` * **Lexer**: Breaks the query string into a sequence of tokens. * **Parser**: Builds an Abstract Syntax Tree (AST) from the tokens, validating the query's syntax. @@ -61,6 +59,37 @@ The simplest possible BDSL query is `SELECT *`. When executed within the context When the `FROM` clause is present, it overrides any context provided by the environment. This is useful for clients that connect to a generic query endpoint and need to specify the target resource directly within the query text. +### 2.4. Case Sensitivity + +BDSL follows SQL-like conventions for case sensitivity: + +* **Reserved words are case-insensitive**: Keywords like `SELECT`, `FROM`, `WHERE`, `ORDER BY`, etc. can be written in any case combination. +* **Identifiers are case-sensitive**: Names of streams, measures, properties, tags, and fields preserve their case and must be referenced exactly as defined. + +#### Examples + +All of these queries are equivalent: +```sql +SELECT * FROM STREAM sw WHERE service_id = 'webapp'; + +select * from stream sw where service_id = 'webapp'; + +Select * From Stream sw Where service_id = 'webapp'; +``` + +But these refer to different identifiers: +```sql +-- Different tag names (case-sensitive) +SELECT ServiceName FROM STREAM sw; -- refers to tag "ServiceName" +SELECT servicename FROM STREAM sw; -- refers to tag "servicename" + +-- Different stream names (case-sensitive) +FROM STREAM MyStream -- refers to stream "MyStream" +FROM STREAM mystream -- refers to stream "mystream" +``` + +**Best Practice**: Use uppercase for reserved words and consistent casing for identifiers to maintain readability. + ## 3. BDSL for Streams BDSL for streams is designed for querying and retrieving raw time-series elements. The syntax maps to the `banyandb.stream.v1.QueryRequest` message. @@ -79,6 +108,9 @@ binary_op ::= "=" | "!=" | ">" | "<" | ">=" | "<=" | "IN" | "NOT IN" | "HA order_expression::= identifier ["ASC" | "DESC"] value ::= string_literal | integer_literal | "NULL" value_list ::= "(" value ("," value)* ")" +identifier ::= [a-zA-Z_][a-zA-Z0-9_]* +string_literal ::= "'" [^']* "'" | "\"" [^\"]* "\"" +integer_literal ::= [0-9]+ ``` ### 3.2. Mapping to `stream.v1.QueryRequest` @@ -117,7 +149,27 @@ WHERE service_id IN ('webapp', 'api-gateway') OR http.method = 'POST'; BDSL for measures is tailored for analytical queries on aggregated numerical data. It supports aggregation, grouping, and mixed selection of tags and fields, mapping to the `banyandb.measure.v1.QueryRequest` message. -### 4.1. BDSL Extensions for `SELECT` +### 4.1. Grammar + +``` +measure_query ::= SELECT projection [from_measure_clause] [WHERE criteria] [GROUP BY group_list] [ORDER BY order_expression] [LIMIT integer] [OFFSET integer] +from_measure_clause ::= "FROM MEASURE" identifier ["IN" "(" group_list ")"] +projection ::= "*" | (column_list | agg_function "(" identifier ")" | "TOP" integer projection) +column_list ::= identifier ("," identifier)* ["::tag" | "::field"] +agg_function ::= "SUM" | "MEAN" | "COUNT" | "MAX" | "MIN" +group_list ::= identifier ("," identifier)* +criteria ::= condition (("AND" | "OR") condition)* +condition ::= identifier binary_op (value | value_list) +binary_op ::= "=" | "!=" | ">" | "<" | ">=" | "<=" | "IN" | "NOT IN" | "HAVING" | "NOT HAVING" | "MATCH" +order_expression ::= identifier ["ASC" | "DESC"] +value ::= string_literal | integer_literal | "NULL" +value_list ::= "(" value ("," value)* ")" +identifier ::= [a-zA-Z_][a-zA-Z0-9_]* +string_literal ::= "'" [^']* "'" | "\"" [^\"]* "\"" +integer_literal ::= [0-9]+ +``` + +### 4.2. BDSL Extensions for `SELECT` The `SELECT` clause for measures is highly flexible, allowing for the selection of tags, fields, and aggregations in a single, flat list. @@ -125,15 +177,15 @@ The `SELECT` clause for measures is highly flexible, allowing for the selection * `SELECT <identifier>::field, <identifier>::tag`: If a field and a tag share the same name, the `::field` or `::tag` syntax **must** be used to disambiguate the identifier's type. * The clause also supports aggregation functions (`SUM`, `MEAN`, `COUNT`, `MAX`, `MIN`) and a `TOP N` clause for ranked results. -### 4.2. Mapping to `measure.v1.QueryRequest` +### 4.3. Mapping to `measure.v1.QueryRequest` -* **`FROM MEASURE name`**: Maps to the `name` field. If omitted, this value is taken from the execution context. +* **`FROM MEASURE name IN (groups)`**: Maps to the `name` and `groups` fields. If the clause is omitted, these values are taken from the execution context. * **`SELECT <tag1>, <field1>, <field2>`**: The transformer inspects each identifier. Those identified as tags (either by schema lookup or `::tag`) are added to `tag_projection`. Those identified as fields (by schema lookup or `::field`) are added to `field_projection`. * **`SELECT SUM(field)`**: Maps to `agg`. * **`GROUP BY <tag1>, <tag2>`**: The `GROUP BY` clause takes a simple list of tags and maps to `group_by.tag_projection`. * **`SELECT TOP N ...`**: Maps to the `top` message. -### 4.3. Examples +### 4.4. Examples ```sql -- Simplest query (context must provide measure name) @@ -167,6 +219,13 @@ SELECT TOP 10 FROM MEASURE instance_metrics WHERE service = 'auth-service' ORDER BY cpu_usage DESC; + +-- Select aggregated latency from multiple groups +SELECT + region, + SUM(latency) +FROM MEASURE service_cpm IN (us-west, us-east, eu-central) +GROUP BY region; ``` ## 5. BDSL for Top-N @@ -176,13 +235,24 @@ Top-N queries use a specialized, command-like syntax for clarity and to reflect ### 5.1. Grammar ``` -topn_query ::= SHOW TOP integer from_measure_clause [WHERE topn_criteria] [AGGREGATE BY agg_function] [ORDER BY value [ASC|DESC]] -from_measure_clause ::= "FROM MEASURE" identifier +topn_query ::= SHOW TOP integer from_measure_clause [WHERE topn_criteria] [AGGREGATE BY agg_function] [ORDER BY value ["ASC"|"DESC"]] +from_measure_clause ::= "FROM MEASURE" identifier ["IN" "(" group_list ")"] +topn_criteria ::= condition (("AND" | "OR") condition)* +condition ::= identifier binary_op (value | value_list) +binary_op ::= "=" | "!=" | ">" | "<" | ">=" | "<=" | "IN" | "NOT IN" +agg_function ::= "SUM" | "MEAN" | "COUNT" | "MAX" | "MIN" +group_list ::= identifier ("," identifier)* +value ::= string_literal | integer_literal | "NULL" +value_list ::= "(" value ("," value)* ")" +identifier ::= [a-zA-Z_][a-zA-Z0-9_]* +string_literal ::= "'" [^']* "'" | "\"" [^\"]* "\"" +integer_literal ::= [0-9]+ ``` ### 5.2. Mapping to `measure.v1.TopNRequest` * **`SHOW TOP N`**: Maps to `top_n`. +* **`FROM MEASURE name IN (groups)`**: Maps to the `name` and `groups` fields. * **`WHERE tag = 'value'`**: Maps to `conditions`. Only simple equality is supported. * **`AGGREGATE BY FUNC`**: Maps to `agg`. * **`ORDER BY value DESC`**: Maps to `field_value_sort`. @@ -202,10 +272,15 @@ FROM MEASURE service_errors_total ORDER BY value ASC; -- Get the Top 3 pods with the most restarts in total over the time range -SHOW TOP 3 +SHOW TOP 3 FROM MEASURE pod_restarts WHERE namespace = 'production' AGGREGATE BY SUM; + +-- Get the Top 5 services with the highest error rate across multiple groups +SHOW TOP 5 +FROM MEASURE service_errors IN (production, staging) +ORDER BY value DESC; ``` ## 6. BDSL for Properties @@ -215,14 +290,24 @@ BDSL for properties is designed for simple key-value lookups and metadata filter ### 6.1. Grammar ``` -property_query ::= SELECT projection [from_property_clause] [WHERE criteria] [LIMIT integer] -from_property_clause ::= "FROM PROPERTY" identifier -projection ::= "*" | column_list +property_query ::= SELECT projection [from_property_clause] [WHERE criteria] [LIMIT integer] +from_property_clause ::= "FROM PROPERTY" identifier ["IN" "(" group_list ")"] +projection ::= "*" | column_list +column_list ::= identifier ("," identifier)* +group_list ::= identifier ("," identifier)* +criteria ::= condition (("AND" | "OR") condition)* +condition ::= identifier binary_op (value | value_list) | "ID" binary_op (value | value_list) +binary_op ::= "=" | "!=" | ">" | "<" | ">=" | "<=" | "IN" | "NOT IN" +value ::= string_literal | integer_literal | "NULL" +value_list ::= "(" value ("," value)* ")" +identifier ::= [a-zA-Z_][a-zA-Z0-9_]* +string_literal ::= "'" [^']* "'" | "\"" [^\"]* "\"" +integer_literal ::= [0-9]+ ``` ### 6.2. Mapping to `property.v1.QueryRequest` -* **`FROM PROPERTY name`**: Maps to the `name` field. If omitted, this value is taken from the execution context. +* **`FROM PROPERTY name IN (groups)`**: Maps to the `name` and `groups` fields. If the clause is omitted, these values are taken from the execution context. * **`SELECT tags`**: Maps to `tag_projection`. * **`WHERE ID IN (...)`**: Maps to `ids`. * **`WHERE tag = 'value'`**: Maps to `criteria`. @@ -245,9 +330,15 @@ SELECT * WHERE ID = 'server-1a2b3c'; -- Retrieve a set of properties by their unique IDs -SELECT ip, region -FROM PROPERTY server_metadata +SELECT ip, region +FROM PROPERTY server_metadata WHERE ID IN ('server-1a2b3c', 'server-4d5e6f'); + +-- Find properties from multiple groups +SELECT ip, owner +FROM PROPERTY server_metadata IN (datacenter-1, datacenter-2, datacenter-3) +WHERE in_service = 'true' +LIMIT 100; ``` ## 7. Summary of BDSL Capabilities