This is an automated email from the ASF dual-hosted git repository.

wusheng pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/skywalking-banyandb.git


The following commit(s) were added to refs/heads/main by this push:
     new da0b34c6 Update required and missing selection in bydbQL documentation 
(#805)
da0b34c6 is described below

commit da0b34c6b7d6cae25b37a46b4a2a06a57fcf50db
Author: mrproliu <[email protected]>
AuthorDate: Mon Oct 13 15:22:07 2025 +0800

    Update required and missing selection in bydbQL documentation (#805)
---
 docs/interacting/bydbql.md | 440 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 319 insertions(+), 121 deletions(-)

diff --git a/docs/interacting/bydbql.md b/docs/interacting/bydbql.md
index 757841de..6f7c0dc7 100644
--- a/docs/interacting/bydbql.md
+++ b/docs/interacting/bydbql.md
@@ -42,18 +42,24 @@ Abstract Syntax Tree (AST)
 
 ### 2.2. Distinguishing Query Types
 
-BydbQL distinguishes the target data model either explicitly through keywords 
in the `FROM` clause or implicitly through the **execution context**. This 
allows the parser to apply the correct grammar and transformation rules for the 
query.
+BydbQL distinguishes the target data model explicitly through keywords in the 
`FROM` clause. This allows the parser to apply the correct grammar and 
transformation rules for the query.
 
-*   **Explicit**: `FROM STREAM <name>`, `FROM MEASURE <name>`, `FROM PROPERTY 
<name>`
-*   **Implicit**: The query is executed against a resource-specific endpoint 
(e.g., `/v1/streams/{stream_name}/query`), where the type and name of the 
resource are known.
+*   **Streams**: `FROM STREAM <name> IN <groups>` or `FROM STREAM <name> IN 
(<groups>)`
+*   **Measures**: `FROM MEASURE <name> IN <groups>` or `FROM MEASURE <name> IN 
(<groups>)`
+*   **Properties**: `FROM PROPERTY <name> IN <groups>` or `FROM PROPERTY 
<name> IN (<groups>)`
+*   **Traces**: `FROM TRACE <name> IN <groups>` or `FROM TRACE <name> IN 
(<groups>)`
 
-Specialized queries, like Top-N, use a distinct top-level command and 
typically require an explicit `FROM` clause:
+Specialized queries, like Top-N, use a distinct top-level command:
 
-*   **Top-N**: `SHOW TOPN ... FROM MEASURE <measure_name>`
+*   **Top-N**: `SHOW TOP <n> FROM MEASURE <name> IN <groups>` or `SHOW TOP <n> 
FROM MEASURE <name> IN (<groups>)`
 
-### 2.3. Optional `FROM` Clause
+### 2.3. Required Clauses
 
-In BydbQL, the `FROM` clause is **required** for `SELECT` queries.
+In BydbQL, the following clauses are **required** for all queries:
+
+*   **`FROM` clause**: Specifies the data model type, resource name, and group 
list
+*   **`IN groups` clause**: Specifies one or more groups to query from. 
Parentheses around the group list are optional.
+*   **`TIME` clause**: Specifies the time range for the query (required for 
Streams, Measures, Traces, and Top-N queries; not applicable to Property 
queries)
 
 ### 2.4. Case Sensitivity
 
@@ -66,22 +72,22 @@ BydbQL follows SQL-like conventions for case sensitivity:
 
 All of these queries are equivalent:
 ```sql
-SELECT * FROM STREAM sw WHERE service_id = 'webapp';
+SELECT * FROM STREAM sw in group1 WHERE service_id = 'webapp';
 
-select * from stream sw where service_id = 'webapp';
+select * from stream sw in group1 where service_id = 'webapp';
 
-Select * From Stream sw Where service_id = 'webapp';
+Select * From Stream sw in group1 Where service_id = 'webapp';
 ```
 
 But these refer to different identifiers:
 ```sql
 -- Different tag names (case-sensitive)
-SELECT ServiceName FROM STREAM sw;  -- refers to tag "ServiceName"
-SELECT servicename FROM STREAM sw;  -- refers to tag "servicename"
+SELECT ServiceName FROM STREAM sw in group1;  -- refers to tag "ServiceName"
+SELECT servicename FROM STREAM sw in group1;  -- refers to tag "servicename"
 
 -- Different stream names (case-sensitive)
-FROM STREAM MyStream    -- refers to stream "MyStream"
-FROM STREAM mystream    -- refers to stream "mystream"
+FROM STREAM MyStream in group1   -- refers to stream "MyStream"
+FROM STREAM mystream in group1   -- refers to stream "mystream"
 ```
 
 **Best Practice**: Use uppercase for reserved words and consistent casing for 
identifiers to maintain readability.
@@ -140,6 +146,147 @@ TIME < '-1d'
 
 The `parseTime` function automatically determines whether a timestamp is 
absolute (RFC3339) or relative (duration string) and converts it appropriately.
 
+## 3. WHERE Statement
+
+The WHERE clause in BydbQL provides powerful filtering capabilities with 
support for various operators, logical expressions, and full-text search. It 
supports binary tree structures with proper operator precedence and parentheses 
grouping.
+
+### Key Features
+
+*   **Binary Tree Structure**: WHERE conditions are organized as a binary 
expression tree supporting complex nested logic
+*   **Operator Precedence**: Parentheses `()` > `AND` > `OR`
+*   **Multiple Operators**: Comparison (`=`, `!=`, `>`, `<`, `>=`, `<=`), set 
operations (`IN`, `NOT IN`, `HAVING`, `NOT HAVING`), and full-text search 
(`MATCH`)
+*   **Type Support**: String, integer, and NULL values
+*   **Complex Expressions**: Support for nested parentheses and mixed AND/OR 
logic
+
+### 3.1. MATCH Operator
+
+The `MATCH` operator provides full-text search capabilities for string fields 
and tags in BydbQL. It uses text analyzers to tokenize and match text content, 
making it ideal for searching log messages, trace attributes, and other textual 
data.
+
+#### 3.1.1. Syntax
+
+The MATCH operator uses a function-call syntax with optional parameters:
+
+```
+MATCH(value)
+MATCH(value, analyzer)
+MATCH(value, analyzer, operator)
+MATCH((value1, value2, ...), analyzer, operator)
+```
+
+**Parameters:**
+
+*   **value(s)** (required): The search term(s) to match. Can be:
+    *   Single value: `MATCH('error')`
+    *   Multiple values (array): `MATCH(('error', 'warning'))` - wrapped in 
parentheses
+    *   Supports both string and integer types
+
+*   **analyzer** (optional): The text analyzer to use for tokenizing and 
matching. Common analyzers include:
+    *   `"standard"` - Standard text analysis with lowercase and word 
tokenization
+    *   `"simple"` - Simple lowercase analysis
+    *   `"keyword"` - No tokenization, exact matching (case-insensitive)
+    *   `"url"` - URL-specific tokenization
+    *   Default: Uses the analyzer configured for the field/tag in the schema
+
+*   **operator** (optional): Logical operator for multiple values. Valid 
values:
+    *   `"AND"` - All values must match (default for multiple values)
+    *   `"OR"` - At least one value must match
+
+#### 3.1.2. Supported Data Types
+
+The MATCH operator is available in:
+
+*   **Streams**: For full-text search in stream tags
+*   **Measures**: For full-text search in measure tags
+*   **Traces**: For full-text search in trace tags (e.g., searching span 
attributes)
+
+Note: MATCH is not supported in Top-N and Property queries.
+
+#### 3.1.3. Examples
+
+#### Basic MATCH queries
+
+```sql
+-- Simple text search
+SELECT trace_id, message
+FROM STREAM logs in group1
+TIME > '-30m'
+WHERE message MATCH('error');
+
+-- Search with specific analyzer
+SELECT trace_id, http_url
+FROM TRACE sw_trace in group1
+TIME > '-30m'
+WHERE http_url MATCH('/api/users', 'url');
+
+-- Search with analyzer and operator
+SELECT service_id, log_message
+FROM STREAM application_logs in group1
+TIME > '-30m'
+WHERE log_message MATCH('error', 'standard', 'OR');
+```
+
+#### Multiple value searches
+
+```sql
+-- Search for multiple terms (default AND logic)
+SELECT trace_id, operation_name
+FROM TRACE sw_trace in group1
+TIME > '-30m'
+WHERE operation_name MATCH(('GET', 'POST'), 'keyword');
+
+-- Search for any of multiple terms (OR logic)
+SELECT service_id, message
+FROM STREAM logs in group1
+TIME > '-30m'
+WHERE message MATCH(('error', 'warning', 'critical'), 'standard', 'OR');
+
+-- Multiple values with explicit AND
+SELECT trace_id, tags
+FROM TRACE sw_trace in group1
+TIME > '-30m'
+WHERE tags MATCH(('payment', 'success'), 'standard', 'AND');
+```
+
+#### Combined with other conditions
+
+```sql
+-- MATCH with other WHERE conditions
+SELECT trace_id, service_id, message
+FROM STREAM logs in group1
+TIME > '-30m'
+WHERE service_id = 'payment-service'
+  AND message MATCH(('error', 'timeout'), 'standard', 'OR')
+ORDER BY timestamp DESC
+LIMIT 100;
+
+-- Complex query with MATCH and time range
+SELECT trace_id, operation_name, http_status
+FROM TRACE sw_trace in group1
+TIME > '-30m'
+TIME BETWEEN '-1h' AND 'now'
+WHERE service_id IN ('api-gateway', 'auth-service')
+  AND http_status > 400
+  AND operation_name MATCH('api', 'keyword')
+ORDER BY start_time DESC;
+```
+
+#### 3.1.4. Performance Considerations
+
+*   **Indexing**: MATCH queries leverage full-text indexes. Ensure your 
tags/fields are properly indexed for optimal performance.
+*   **Analyzer Selection**: Choose the appropriate analyzer for your use case:
+    *   Use `"keyword"` for exact matching
+    *   Use `"standard"` for general text search
+    *   Use `"url"` for URL-specific patterns
+*   **Multiple Values**: When using multiple values, `"OR"` logic may scan 
more results than `"AND"` logic.
+*   **Combination with Filters**: Combine MATCH with other WHERE conditions to 
narrow down results and improve query performance.
+
+#### 3.1.5. Notes
+
+*   The MATCH operator is case-insensitive by default (behavior depends on the 
analyzer).
+*   Multiple values must be wrapped in parentheses: `MATCH(('val1', 'val2'))`.
+*   The analyzer and operator parameters are optional; when omitted, schema 
defaults are used.
+*   For single-value searches, the operator parameter is ignored.
+
 ## 4. BydbQL for Streams
 
 BydbQL for streams is designed for querying and retrieving raw time-series 
elements. The syntax maps to the `banyandb.stream.v1.QueryRequest` message.
@@ -147,11 +294,11 @@ BydbQL for streams is designed for querying and 
retrieving raw time-series eleme
 ### 4.1. Grammar
 
 ```
-query           ::= SELECT projection [from_stream_clause] [TIME 
time_condition] [WHERE criteria] [ORDER BY order_expression] [LIMIT integer] 
[OFFSET integer] [WITH QUERY_TRACE]
-from_stream_clause ::= "FROM STREAM" identifier ["IN" "(" group_list ")"]
+query           ::= SELECT projection from_stream_clause TIME time_condition 
[WHERE criteria] [ORDER BY order_expression] [LIMIT integer] [OFFSET integer] 
[WITH QUERY_TRACE]
+from_stream_clause ::= "FROM STREAM" identifier "IN" ["("] group_list [")"]
 projection      ::= "*" | column_list
 column_list     ::= identifier ("," identifier)*
-group_list      ::= identifier ("," identifier)*
+group_list      ::= identifier ("," identifier)+
 criteria        ::= condition (("AND" | "OR") condition)*
 condition       ::= identifier binary_op (value | value_list)
 time_condition  ::= "=" timestamp | ">" timestamp | "<" timestamp | ">=" 
timestamp | "<=" timestamp | "BETWEEN" timestamp "AND" timestamp
@@ -168,15 +315,21 @@ string_literal  ::= "'" [^']* "'" | "\"" [^\"]* "\""
 integer_literal ::= [0-9]+
 ```
 
+**Note**: The `TIME` clause and `IN groups` clause are **required** for all 
stream queries. At least one group must be specified. Parentheses around the 
group list are optional.
+
 ### 4.2. Mapping to `stream.v1.QueryRequest`
 
-*   **`FROM STREAM name IN (groups)`**: Maps to the `name` and `groups` 
fields. If the clause is omitted, these values are taken from the execution 
context.
+*   **`FROM STREAM name IN groups`** or **`FROM STREAM name IN (groups)`**: 
Maps to the `name` and `groups` fields. Both are required.
 *   **`SELECT tags`**: Maps to `projection`. Requires a stream schema to 
resolve tags to their families.
-*   **`TIME = '2023-01-01T00:00:00Z'`**: Maps to `time_range` with `begin` and 
`end` set to the same timestamp.
-*   **`TIME > '2023-01-01T00:00:00Z'`**: Maps to `time_range` with `begin` set 
to the timestamp.
-*   **`TIME BETWEEN '2023-01-01T00:00:00Z' AND '2023-01-02T00:00:00Z'`**: Maps 
to `time_range` with `begin` and `end` set to the respective timestamps.
-*   **`TIME > '-30m'`**: Maps to `time_range` with `begin` set to 30 minutes 
ago.
-*   **`TIME BETWEEN '-1h' AND 'now'`**: Maps to `time_range` from 1 hour ago 
to current time.
+*   **`TIME` clause (required)**: Maps to `time_range`:
+    *   **`TIME = '2023-01-01T00:00:00Z'`**: Sets `begin` and `end` to the 
same timestamp.
+    *   **`TIME > '2023-01-01T00:00:00Z'`**: Sets `begin` to the timestamp.
+    *   **`TIME < '2023-01-01T00:00:00Z'`**: Sets `end` to the timestamp.
+    *   **`TIME >= '2023-01-01T00:00:00Z'`**: Sets `begin` to the timestamp 
(inclusive).
+    *   **`TIME <= '2023-01-01T00:00:00Z'`**: Sets `end` to the timestamp 
(inclusive).
+    *   **`TIME BETWEEN '2023-01-01T00:00:00Z' AND '2023-01-02T00:00:00Z'`**: 
Sets `begin` and `end` to the respective timestamps.
+    *   **`TIME > '-30m'`**: Sets `begin` to 30 minutes ago.
+    *   **`TIME BETWEEN '-1h' AND 'now'`**: Sets `begin` to 1 hour ago and 
`end` to current time.
 *   **`WHERE conditions`**: Maps to `criteria`.
 *   **`ORDER BY field`**: Maps to `order_by`.
 *   **`LIMIT`/`OFFSET`**: Maps to `limit` and `offset`.
@@ -187,69 +340,73 @@ integer_literal ::= [0-9]+
 ```sql
 -- Basic selection with filtering and ordering
 SELECT trace_id, service_id
-FROM STREAM sw IN (group1, group2)
+FROM STREAM sw IN group1, group2
+TIME > '-30m'
 WHERE service_id = 'webapp' AND state = 1
 ORDER BY start_time DESC
 LIMIT 100;
 
--- Project all tags from a stream (FROM is explicit)
-SELECT * 
-FROM STREAM sw 
-WHERE state = 0 
+-- Project all tags from a stream
+SELECT *
+FROM STREAM sw IN group1, group2
+TIME > '-30m'
+WHERE state = 0
 LIMIT 10;
 
 -- Use more complex conditions with IN and OR
 SELECT trace_id, duration
-FROM STREAM sw
+FROM STREAM sw IN group1, group2
+TIME > '-30m'
 WHERE service_id IN ('webapp', 'api-gateway') OR http.method = 'POST';
 
 -- Query with time range using TIME clause
 SELECT trace_id, service_id, start_time
-FROM STREAM sw
+FROM STREAM sw IN group1, group2
+TIME > '-30m'
 TIME BETWEEN '2023-01-01T00:00:00Z' AND '2023-01-02T00:00:00Z'
 WHERE service_id = 'webapp'
 ORDER BY start_time DESC;
 
 -- Query data after a specific timestamp
 SELECT *
-FROM STREAM sw
+FROM STREAM sw IN group1, group2
 TIME > '2023-01-01T12:00:00Z'
 WHERE state = 1
 LIMIT 100;
 
 -- Query data at an exact timestamp
 SELECT trace_id, duration
-FROM STREAM sw
+FROM STREAM sw IN group1, group2
 TIME = '2023-01-01T15:30:00Z';
 
 -- Query with relative time - last 30 minutes
 SELECT *
-FROM STREAM sw
+FROM STREAM sw IN group1, group2
 TIME > '-30m'
 WHERE state = 1
 LIMIT 100;
 
 -- Query with relative time range - last 2 hours
 SELECT trace_id, service_id, start_time
-FROM STREAM sw
+FROM STREAM sw IN group1, group2
 TIME BETWEEN '-2h' AND 'now'
 WHERE service_id = 'webapp'
 ORDER BY start_time DESC;
 
 -- Query data within the last hour using absolute time
 SELECT *
-FROM STREAM sw
+FROM STREAM sw IN group1, group2
 TIME >= '2023-01-01T13:00:00Z'
 WHERE status = 'error';
 
 -- Query data older than 1 day ago
 SELECT trace_id, duration
-FROM STREAM sw
+FROM STREAM sw IN group1, group2
 TIME < '-1d';
 
 -- Query with distributed tracing enabled
 SELECT trace_id, service_id, start_time
-FROM STREAM sw
+FROM STREAM sw IN group1, group2
 TIME BETWEEN '-2h' AND 'now'
 WHERE service_id = 'webapp'
 WITH QUERY_TRACE;
@@ -262,12 +419,12 @@ BydbQL for measures is tailored for analytical queries on 
aggregated numerical d
 ### 5.1. Grammar
 
 ```
-measure_query     ::= SELECT projection [from_measure_clause] [TIME 
time_condition] [WHERE criteria] [GROUP BY group_list] [ORDER BY 
order_expression] [LIMIT integer] [OFFSET integer] [WITH QUERY_TRACE]
-from_measure_clause ::= "FROM MEASURE" identifier ["IN" "(" group_list ")"]
+measure_query     ::= SELECT projection from_measure_clause TIME 
time_condition [WHERE criteria] [GROUP BY group_list] [ORDER BY 
order_expression] [LIMIT integer] [OFFSET integer] [WITH QUERY_TRACE]
+from_measure_clause ::= "FROM MEASURE" identifier "IN" ["("] group_list [")"]
 projection        ::= "*" | (column_list | agg_function "(" identifier ")" | 
"TOP" integer projection)
 column_list       ::= identifier ("," identifier)* ["::tag" | "::field"]
 agg_function      ::= "SUM" | "MEAN" | "COUNT" | "MAX" | "MIN"
-group_list        ::= identifier ("," identifier)*
+group_list        ::= identifier ("," identifier)+
 criteria          ::= condition (("AND" | "OR") condition)*
 condition         ::= identifier binary_op (value | value_list)
 time_condition    ::= "=" timestamp | ">" timestamp | "<" timestamp | ">=" 
timestamp | "<=" timestamp | "BETWEEN" timestamp "AND" timestamp
@@ -284,6 +441,8 @@ string_literal    ::= "'" [^']* "'" | "\"" [^\"]* "\""
 integer_literal   ::= [0-9]+
 ```
 
+**Note**: The `TIME` clause and `IN groups` clause are **required** for all 
measure queries. At least one group must be specified. Parentheses around the 
group list are optional.
+
 ### 5.2. BydbQL Extensions for `SELECT`
 
 The `SELECT` clause for measures is highly flexible, allowing for the 
selection of tags, fields, and aggregations in a single, flat list.
@@ -294,14 +453,18 @@ The `SELECT` clause for measures is highly flexible, 
allowing for the selection
 
 ### 5.3. Mapping to `measure.v1.QueryRequest`
 
-*   **`FROM MEASURE name IN (groups)`**: Maps to the `name` and `groups` 
fields. If the clause is omitted, these values are taken from the execution 
context.
+*   **`FROM MEASURE name IN groups`** or **`FROM MEASURE name IN (groups)`**: 
Maps to the `name` and `groups` fields. Both are required.
 *   **`SELECT <tag1>, <field1>, <field2>`**: The transformer inspects each 
identifier. Those identified as tags (either by schema lookup or `::tag`) are 
added to `tag_projection`. Those identified as fields (by schema lookup or 
`::field`) are added to `field_projection`.
 *   **`SELECT SUM(field)`**: Maps to `agg`.
-*   **`TIME = '2023-01-01T00:00:00Z'`**: Maps to `time_range` with `begin` and 
`end` set to the same timestamp.
-*   **`TIME > '2023-01-01T00:00:00Z'`**: Maps to `time_range` with `begin` set 
to the timestamp.
-*   **`TIME BETWEEN '2023-01-01T00:00:00Z' AND '2023-01-02T00:00:00Z'`**: Maps 
to `time_range` with `begin` and `end` set to the respective timestamps.
-*   **`TIME > '-30m'`**: Maps to `time_range` with `begin` set to 30 minutes 
ago.
-*   **`TIME BETWEEN '-1h' AND 'now'`**: Maps to `time_range` from 1 hour ago 
to current time.
+*   **`TIME` clause (required)**: Maps to `time_range`:
+    *   **`TIME = '2023-01-01T00:00:00Z'`**: Sets `begin` and `end` to the 
same timestamp.
+    *   **`TIME > '2023-01-01T00:00:00Z'`**: Sets `begin` to the timestamp.
+    *   **`TIME < '2023-01-01T00:00:00Z'`**: Sets `end` to the timestamp.
+    *   **`TIME >= '2023-01-01T00:00:00Z'`**: Sets `begin` to the timestamp 
(inclusive).
+    *   **`TIME <= '2023-01-01T00:00:00Z'`**: Sets `end` to the timestamp 
(inclusive).
+    *   **`TIME BETWEEN '2023-01-01T00:00:00Z' AND '2023-01-02T00:00:00Z'`**: 
Sets `begin` and `end` to the respective timestamps.
+    *   **`TIME > '-30m'`**: Sets `begin` to 30 minutes ago.
+    *   **`TIME BETWEEN '-1h' AND 'now'`**: Sets `begin` to 1 hour ago and 
`end` to current time.
 *   **`GROUP BY <tag1>, <tag2>`**: The `GROUP BY` clause takes a simple list 
of tags and maps to `group_by.tag_projection`.
 *   **`SELECT TOP N ...`**: Maps to the `top` message.
 *   **`WITH QUERY_TRACE`**: Maps to the `trace` field to enable distributed 
tracing of query execution.
@@ -313,7 +476,8 @@ The `SELECT` clause for measures is highly flexible, 
allowing for the selection
 SELECT
     instance,
     latency
-FROM MEASURE service_cpm
+FROM MEASURE service_cpm IN us-west
+TIME > '-30m'
 WHERE region = 'us-west-1'
 LIMIT 10;
 
@@ -321,21 +485,24 @@ LIMIT 10;
 SELECT
     region,
     SUM(latency)
-FROM MEASURE service_cpm
+FROM MEASURE service_cpm IN us-west
+TIME > '-30m'
 GROUP BY region;
 
 -- Disambiguate a key named 'status' that exists as both a tag and a field
 SELECT
     status::tag,
     status::field
-FROM MEASURE http_requests
+FROM MEASURE http_requests IN us-west
+TIME > '-30m'
 WHERE path = '/api/v1/users';
 
 -- Find the top 10 instances with the highest CPU usage for a specific service
 SELECT TOP 10
     instance,
     cpu_usage
-FROM MEASURE instance_metrics
+FROM MEASURE instance_metrics IN us-west
+TIME > '-30m'
 WHERE service = 'auth-service'
 ORDER BY cpu_usage DESC;
 
@@ -343,14 +510,15 @@ ORDER BY cpu_usage DESC;
 SELECT
     region,
     SUM(latency)
-FROM MEASURE service_cpm IN (us-west, us-east, eu-central)
+FROM MEASURE service_cpm IN us-west, us-east, eu-central
+TIME > '-30m'
 GROUP BY region;
 
 -- Query measures with time range using TIME clause
 SELECT
     service,
     AVG(response_time)
-FROM MEASURE http_metrics
+FROM MEASURE http_metrics IN us-west
 TIME BETWEEN '2023-01-01T00:00:00Z' AND '2023-01-02T00:00:00Z'
 WHERE region = 'us-west'
 GROUP BY service;
@@ -359,7 +527,7 @@ GROUP BY service;
 SELECT
     instance,
     MAX(cpu_usage)
-FROM MEASURE system_metrics
+FROM MEASURE system_metrics IN us-west
 TIME > '2023-01-01T08:00:00Z'
 WHERE datacenter = 'dc-1'
 GROUP BY instance
@@ -369,7 +537,7 @@ ORDER BY MAX(cpu_usage) DESC;
 SELECT
     endpoint,
     SUM(request_count)
-FROM MEASURE api_metrics
+FROM MEASURE api_metrics IN us-west
 TIME = '2023-01-01T10:00:00Z'
 WHERE method = 'POST'
 GROUP BY endpoint;
@@ -378,7 +546,7 @@ GROUP BY endpoint;
 SELECT
     service,
     AVG(response_time)
-FROM MEASURE http_metrics
+FROM MEASURE http_metrics IN us-west
 TIME > '-30m'
 WHERE region = 'us-west'
 GROUP BY service;
@@ -387,7 +555,7 @@ GROUP BY service;
 SELECT
     instance,
     MAX(cpu_usage)
-FROM MEASURE system_metrics
+FROM MEASURE system_metrics IN us-west
 TIME BETWEEN '-2h' AND 'now'
 WHERE datacenter = 'dc-1'
 GROUP BY instance
@@ -397,7 +565,7 @@ ORDER BY MAX(cpu_usage) DESC;
 SELECT
     endpoint,
     SUM(error_count)
-FROM MEASURE api_errors
+FROM MEASURE api_errors IN us-west
 TIME < '-1d'
 WHERE status_code = '500'
 GROUP BY endpoint;
@@ -406,7 +574,7 @@ GROUP BY endpoint;
 SELECT
     service,
     AVG(response_time)
-FROM MEASURE http_metrics
+FROM MEASURE http_metrics IN us-west
 TIME > '-30m'
 WHERE region = 'us-west'
 GROUP BY service
@@ -420,14 +588,14 @@ Top-N queries use a specialized, command-like syntax for 
clarity and to reflect
 ### 6.1. Grammar
 
 ```
-topn_query         ::= SHOW TOP integer from_measure_clause [TIME 
time_condition] [WHERE topn_criteria] [AGGREGATE BY agg_function] [ORDER BY 
value ["ASC"|"DESC"]] [WITH QUERY_TRACE]
-from_measure_clause ::= "FROM MEASURE" identifier ["IN" "(" group_list ")"]
+topn_query         ::= SHOW TOP integer from_measure_clause TIME 
time_condition [WHERE topn_criteria] [AGGREGATE BY agg_function] [ORDER BY 
value ["ASC"|"DESC"]] [WITH QUERY_TRACE]
+from_measure_clause ::= "FROM MEASURE" identifier "IN" ["("] group_list [")"]
 topn_criteria      ::= condition (("AND" | "OR") condition)*
 condition          ::= identifier binary_op (value | value_list)
 time_condition     ::= "=" timestamp | ">" timestamp | "<" timestamp | ">=" 
timestamp | "<=" timestamp | "BETWEEN" timestamp "AND" timestamp
 binary_op          ::= "=" | "!=" | ">" | "<" | ">=" | "<=" | "IN" | "NOT IN"
 agg_function       ::= "SUM" | "MEAN" | "COUNT" | "MAX" | "MIN"
-group_list         ::= identifier ("," identifier)*
+group_list         ::= identifier ("," identifier)+
 value              ::= string_literal | integer_literal | "NULL"
 value_list         ::= "(" value ("," value)* ")"
 timestamp          ::= string_literal | integer_literal
@@ -439,13 +607,21 @@ string_literal     ::= "'" [^']* "'" | "\"" [^\"]* "\""
 integer_literal    ::= [0-9]+
 ```
 
+**Note**: The `TIME` clause and `IN groups` clause are **required** for all 
Top-N queries. At least one group must be specified. Parentheses around the 
group list are optional.
+
 ### 6.2. Mapping to `measure.v1.TopNRequest`
 
 *   **`SHOW TOP N`**: Maps to `top_n`.
-*   **`FROM MEASURE name IN (groups)`**: Maps to the `name` and `groups` 
fields.
-*   **`TIME BETWEEN '2023-01-01T00:00:00Z' AND '2023-01-02T00:00:00Z'`**: Maps 
to `time_range` with `begin` and `end` set to the respective timestamps.
-*   **`TIME > '-30m'`**: Maps to `time_range` with `begin` set to 30 minutes 
ago.
-*   **`TIME BETWEEN '-1h' AND 'now'`**: Maps to `time_range` from 1 hour ago 
to current time.
+*   **`FROM MEASURE name IN groups`** or **`FROM MEASURE name IN (groups)`**: 
Maps to the `name` and `groups` fields. Both are required.
+*   **`TIME` clause (required)**: Maps to `time_range`:
+    *   **`TIME = '2023-01-01T00:00:00Z'`**: Sets `begin` and `end` to the 
same timestamp.
+    *   **`TIME > '2023-01-01T00:00:00Z'`**: Sets `begin` to the timestamp.
+    *   **`TIME < '2023-01-01T00:00:00Z'`**: Sets `end` to the timestamp.
+    *   **`TIME >= '2023-01-01T00:00:00Z'`**: Sets `begin` to the timestamp 
(inclusive).
+    *   **`TIME <= '2023-01-01T00:00:00Z'`**: Sets `end` to the timestamp 
(inclusive).
+    *   **`TIME BETWEEN '2023-01-01T00:00:00Z' AND '2023-01-02T00:00:00Z'`**: 
Sets `begin` and `end` to the respective timestamps.
+    *   **`TIME > '-30m'`**: Sets `begin` to 30 minutes ago.
+    *   **`TIME BETWEEN '-1h' AND 'now'`**: Sets `begin` to 1 hour ago and 
`end` to current time.
 *   **`WHERE tag = 'value'`**: Maps to `conditions`. Only simple equality is 
supported.
 *   **`AGGREGATE BY FUNC`**: Maps to `agg`.
 *   **`ORDER BY value DESC`**: Maps to `field_value_sort`.
@@ -456,61 +632,65 @@ integer_literal    ::= [0-9]+
 ```sql
 -- Get the Top 10 services with the highest latency
 SHOW TOP 10
-FROM MEASURE service_latency
+FROM MEASURE service_latency IN production
+TIME > '-30m'
 WHERE http_method = 'GET' AND version = 'v1.2.0'
 ORDER BY value DESC;
 
 -- Get the Bottom 5 services with the fewest errors
-SHOW TOP 5 
-FROM MEASURE service_errors_total
+SHOW TOP 5
+FROM MEASURE service_errors_total IN production
+TIME > '-30m'
 ORDER BY value ASC;
 
 -- Get the Top 3 pods with the most restarts in total over the time range
 SHOW TOP 3
-FROM MEASURE pod_restarts
+FROM MEASURE pod_restarts IN production
+TIME > '-30m'
 WHERE namespace = 'production'
 AGGREGATE BY SUM;
 
 -- Get the Top 5 services with the highest error rate across multiple groups
 SHOW TOP 5
-FROM MEASURE service_errors IN (production, staging)
+FROM MEASURE service_errors IN production, staging
+TIME > '-30m'
 ORDER BY value DESC;
 
 -- Get the Top 10 services with highest latency in the last hour
 SHOW TOP 10
-FROM MEASURE service_latency
+FROM MEASURE service_latency IN production
 TIME > '2023-01-01T13:00:00Z'
 ORDER BY value DESC;
 
 -- Get the Top 5 endpoints with most errors in a specific time range
 SHOW TOP 5
-FROM MEASURE endpoint_errors
+FROM MEASURE endpoint_errors IN production
 TIME BETWEEN '2023-01-01T00:00:00Z' AND '2023-01-02T00:00:00Z'
 WHERE status_code = '500'
 ORDER BY value DESC;
 
 -- Get the Top 10 services with highest latency in the last hour
 SHOW TOP 10
-FROM MEASURE service_latency
+FROM MEASURE service_latency IN production
 TIME > '-1h'
 ORDER BY value DESC;
 
 -- Get the Top 5 endpoints with most errors in the last 24 hours
 SHOW TOP 5
-FROM MEASURE endpoint_errors
+FROM MEASURE endpoint_errors IN production
 TIME BETWEEN '-24h' AND 'now'
 WHERE status_code = '500'
 ORDER BY value DESC;
 
 -- Get the Top 3 services with lowest response time in the last 30 minutes
 SHOW TOP 3
-FROM MEASURE service_response_time
+FROM MEASURE service_response_time IN production
 TIME > '-30m'
 ORDER BY value ASC;
 
 -- Top-N query with distributed tracing enabled
 SHOW TOP 10
-FROM MEASURE service_latency
+FROM MEASURE service_latency IN production
 TIME > '-1h'
 WHERE http_method = 'GET'
 ORDER BY value DESC
@@ -524,11 +704,11 @@ BydbQL for properties is designed for simple key-value 
lookups and metadata filt
 ### 7.1. Grammar
 
 ```
-property_query      ::= SELECT projection [from_property_clause] [WHERE 
criteria] [LIMIT integer] [WITH QUERY_TRACE]
-from_property_clause ::= "FROM PROPERTY" identifier ["IN" "(" group_list ")"]
+property_query      ::= SELECT projection from_property_clause [WHERE 
criteria] [LIMIT integer] [WITH QUERY_TRACE]
+from_property_clause ::= "FROM PROPERTY" identifier "IN" ["("] group_list [")"]
 projection          ::= "*" | column_list
 column_list         ::= identifier ("," identifier)*
-group_list          ::= identifier ("," identifier)*
+group_list          ::= identifier ("," identifier)+
 criteria            ::= condition (("AND" | "OR") condition)*
 condition           ::= identifier binary_op (value | value_list) | "ID" 
binary_op (value | value_list)
 binary_op           ::= "=" | "!=" | ">" | "<" | ">=" | "<=" | "IN" | "NOT IN"
@@ -539,9 +719,11 @@ string_literal      ::= "'" [^']* "'" | "\"" [^\"]* "\""
 integer_literal     ::= [0-9]+
 ```
 
+**Note**: The `IN groups` clause is **required** for all property queries. At 
least one group must be specified. Parentheses around the group list are 
optional. Property queries do **not** require a `TIME` clause.
+
 ### 7.2. Mapping to `property.v1.QueryRequest`
 
-*   **`FROM PROPERTY name IN (groups)`**: Maps to the `name` and `groups` 
fields. If the clause is omitted, these values are taken from the execution 
context.
+*   **`FROM PROPERTY name IN groups`** or **`FROM PROPERTY name IN 
(groups)`**: Maps to the `name` and `groups` fields. Both are required.
 *   **`SELECT tags`**: Maps to `tag_projection`.
 *   **`WHERE ID IN (...)`**: Maps to `ids`.
 *   **`WHERE tag = 'value'`**: Maps to `criteria`.
@@ -553,29 +735,29 @@ integer_literal     ::= [0-9]+
 ```sql
 -- Find properties by filtering on their tags
 SELECT ip, owner
-FROM PROPERTY server_metadata
+FROM PROPERTY server_metadata IN datacenter-1
 WHERE datacenter = 'dc-101' AND in_service = 'true'
 LIMIT 50;
 
 -- Retrieve a specific property by its unique ID
 SELECT *
-FROM PROPERTY server_metadata
+FROM PROPERTY server_metadata IN datacenter-1
 WHERE ID = 'server-1a2b3c';
 
 -- Retrieve a set of properties by their unique IDs
 SELECT ip, region
-FROM PROPERTY server_metadata
+FROM PROPERTY server_metadata IN datacenter-1
 WHERE ID IN ('server-1a2b3c', 'server-4d5e6f');
 
 -- Find properties from multiple groups
 SELECT ip, owner
-FROM PROPERTY server_metadata IN (datacenter-1, datacenter-2, datacenter-3)
+FROM PROPERTY server_metadata IN datacenter-1, datacenter-2, datacenter-3
 WHERE in_service = 'true'
 LIMIT 100;
 
 -- Property query with distributed tracing enabled
 SELECT ip, region, owner
-FROM PROPERTY server_metadata
+FROM PROPERTY server_metadata IN datacenter-1
 WHERE datacenter = 'dc-101' AND in_service = 'true'
 LIMIT 50
 WITH QUERY_TRACE;
@@ -588,11 +770,11 @@ BydbQL for traces is designed for querying and retrieving 
trace data with spans.
 ### 8.1. Grammar
 
 ```
-trace_query           ::= SELECT projection [from_trace_clause] [TIME 
time_condition] [WHERE criteria] [ORDER BY order_expression] [LIMIT integer] 
[OFFSET integer] [WITH QUERY_TRACE]
-from_trace_clause     ::= "FROM TRACE" identifier ["IN" "(" group_list ")"]
+trace_query           ::= SELECT projection from_trace_clause TIME 
time_condition [WHERE criteria] [ORDER BY order_expression] [LIMIT integer] 
[OFFSET integer] [WITH QUERY_TRACE]
+from_trace_clause     ::= "FROM TRACE" identifier "IN" ["("] group_list [")"]
 projection            ::= "*" | column_list | "()"
 column_list           ::= identifier ("," identifier)*
-group_list            ::= identifier ("," identifier)*
+group_list            ::= identifier ("," identifier)+
 criteria              ::= condition (("AND" | "OR") condition)*
 condition             ::= identifier binary_op (value | value_list)
 time_condition        ::= "=" timestamp | ">" timestamp | "<" timestamp | ">=" 
timestamp | "<=" timestamp | "BETWEEN" timestamp "AND" timestamp
@@ -609,6 +791,8 @@ string_literal        ::= "'" [^']* "'" | "\"" [^\"]* "\""
 integer_literal       ::= [0-9]+
 ```
 
+**Note**: The `TIME` clause and `IN groups` clause are **required** for all 
trace queries. At least one group must be specified. Parentheses around the 
group list are optional.
+
 ### 8.2. Trace Model Characteristics
 
 The Trace model in BanyanDB is specifically designed for storing and querying 
trace data with the following key characteristics:
@@ -632,14 +816,18 @@ BydbQL for traces supports an empty projection syntax 
`SELECT ()` that allows qu
 
 ### 8.3. Mapping to `trace.v1.QueryRequest`
 
-*   **`FROM TRACE name IN (groups)`**: Maps to the `name` and `groups` fields. 
If the clause is omitted, these values are taken from the execution context.
+*   **`FROM TRACE name IN groups`** or **`FROM TRACE name IN (groups)`**: Maps 
to the `name` and `groups` fields. Both are required.
 *   **`SELECT tags`**: Maps to `tag_projection`. Requires a trace schema to 
resolve tags to their specifications.
 *   **`SELECT ()`**: Maps to an empty `tag_projection` array, indicating no 
tag data should be returned (only raw span data).
-*   **`TIME = '2023-01-01T00:00:00Z'`**: Maps to `time_range` with `begin` and 
`end` set to the same timestamp.
-*   **`TIME > '2023-01-01T00:00:00Z'`**: Maps to `time_range` with `begin` set 
to the timestamp.
-*   **`TIME BETWEEN '2023-01-01T00:00:00Z' AND '2023-01-02T00:00:00Z'`**: Maps 
to `time_range` with `begin` and `end` set to the respective timestamps.
-*   **`TIME > '-30m'`**: Maps to `time_range` with `begin` set to 30 minutes 
ago.
-*   **`TIME BETWEEN '-1h' AND 'now'`**: Maps to `time_range` from 1 hour ago 
to current time.
+*   **`TIME` clause (required)**: Maps to `time_range`:
+    *   **`TIME = '2023-01-01T00:00:00Z'`**: Sets `begin` and `end` to the 
same timestamp.
+    *   **`TIME > '2023-01-01T00:00:00Z'`**: Sets `begin` to the timestamp.
+    *   **`TIME < '2023-01-01T00:00:00Z'`**: Sets `end` to the timestamp.
+    *   **`TIME >= '2023-01-01T00:00:00Z'`**: Sets `begin` to the timestamp 
(inclusive).
+    *   **`TIME <= '2023-01-01T00:00:00Z'`**: Sets `end` to the timestamp 
(inclusive).
+    *   **`TIME BETWEEN '2023-01-01T00:00:00Z' AND '2023-01-02T00:00:00Z'`**: 
Sets `begin` and `end` to the respective timestamps.
+    *   **`TIME > '-30m'`**: Sets `begin` to 30 minutes ago.
+    *   **`TIME BETWEEN '-1h' AND 'now'`**: Sets `begin` to 1 hour ago and 
`end` to current time.
 *   **`WHERE conditions`**: Maps to `criteria` for filtering spans based on 
tag values.
 *   **`ORDER BY field`**: Maps to `order_by` for sorting results.
 *   **`LIMIT`/`OFFSET`**: Maps to `limit` and `offset` for pagination.
@@ -659,74 +847,81 @@ The `WITH QUERY_TRACE` clause enables distributed tracing 
of the query execution
 ```sql
 -- Basic selection with filtering and ordering
 SELECT trace_id, service_id, operation_name
-FROM TRACE sw_trace IN (group1, group2)
+FROM TRACE sw_trace IN group1, group2
+TIME > '-30m'
 WHERE service_id = 'webapp' AND status = 'success'
 ORDER BY start_time DESC
 LIMIT 100;
 
--- Project all tags from a trace (FROM is explicit)
-SELECT * 
-FROM TRACE sw_trace 
-WHERE status = 'error' 
+-- Project all tags from a trace
+SELECT *
+FROM TRACE sw_trace IN group1
+TIME > '-30m'
+WHERE status = 'error'
 LIMIT 10;
 
 -- Query with no tag projection - returns only raw span data
 SELECT ()
-FROM TRACE sw_trace
+FROM TRACE sw_trace IN group1
+TIME > '-30m'
 WHERE service_id = 'webapp'
 LIMIT 100;
 
 -- Use more complex conditions with IN and OR
 SELECT trace_id, duration, operation_name
+FROM TRACE sw_trace IN group1
+TIME > '-30m'
 WHERE service_id IN ('webapp', 'api-gateway') OR http.method = 'POST';
 
 -- Query with time range using TIME clause
 SELECT trace_id, service_id, start_time, operation_name
-FROM TRACE sw_trace
+FROM TRACE sw_trace IN group1
 TIME BETWEEN '2023-01-01T00:00:00Z' AND '2023-01-02T00:00:00Z'
 WHERE service_id = 'webapp'
 ORDER BY start_time DESC;
 
 -- Query data after a specific timestamp
 SELECT *
-FROM TRACE sw_trace
+FROM TRACE sw_trace IN group1
 TIME > '2023-01-01T12:00:00Z'
 WHERE status = 'success'
 LIMIT 100;
 
 -- Query data at an exact timestamp
 SELECT trace_id, duration, operation_name
-FROM TRACE sw_trace
+FROM TRACE sw_trace IN group1
 TIME = '2023-01-01T15:30:00Z';
 
 -- Query with relative time - last 30 minutes
 SELECT *
-FROM TRACE sw_trace
+FROM TRACE sw_trace IN group1
 TIME > '-30m'
 WHERE status = 'error'
 LIMIT 100;
 
 -- Query with relative time range - last 2 hours
 SELECT trace_id, service_id, start_time, operation_name
-FROM TRACE sw_trace
+FROM TRACE sw_trace IN group1
 TIME BETWEEN '-2h' AND 'now'
 WHERE service_id = 'webapp'
 ORDER BY start_time DESC;
 
 -- Query traces by specific trace ID
 SELECT *
-FROM TRACE sw_trace
+FROM TRACE sw_trace IN group1
+TIME > '-30m'
 WHERE trace_id = '1a2b3c4d5e6f7890';
 
 -- Query traces with specific operation names
 SELECT trace_id, service_id, duration
-FROM TRACE sw_trace
+FROM TRACE sw_trace IN group1
+TIME > '-30m'
 WHERE operation_name IN ('GET /api/users', 'POST /api/orders')
 ORDER BY duration DESC;
 
 -- Query traces with error status in the last hour
 SELECT trace_id, service_id, error_message
-FROM TRACE sw_trace
+FROM TRACE sw_trace IN group1
 TIME > '-1h'
 WHERE status = 'error'
 ORDER BY start_time DESC
@@ -734,26 +929,27 @@ LIMIT 50;
 
 -- Query traces older than 1 day ago
 SELECT trace_id, duration, operation_name
-FROM TRACE sw_trace
+FROM TRACE sw_trace IN group1
 TIME < '-1d'
 WHERE service_id = 'legacy-service';
 
 -- Query with no projection for raw span data only
 SELECT ()
-FROM TRACE sw_trace
+FROM TRACE sw_trace IN group1
 TIME > '-1h'
 WHERE status = 'error'
 LIMIT 50;
 
 -- Query with distributed tracing enabled for observability
 SELECT trace_id, service_id, operation_name
-FROM TRACE sw_trace
+FROM TRACE sw_trace IN group1
+TIME > '-30m'
 WHERE service_id = 'webapp'
 WITH QUERY_TRACE;
 
 -- Query with both empty projection and query tracing
 SELECT ()
-FROM TRACE sw_trace
+FROM TRACE sw_trace IN group1
 TIME > '-30m'
 WHERE status = 'error'
 WITH QUERY_TRACE
@@ -762,12 +958,14 @@ LIMIT 100;
 
 ## 9. Summary of BydbQL Capabilities
 
-| Feature             | Streams                               | Measures       
                       | Top-N                               | Properties       
                   | Traces                              |
-| :------------------ | :------------------------------------ | 
:------------------------------------ | :---------------------------------- | 
:---------------------------------- | :---------------------------------- |
-| **Primary Command** | `SELECT ... [FROM STREAM]`            | `SELECT ... 
[FROM MEASURE]`           | `SHOW TOPN ... FROM MEASURE`        | `SELECT ... 
[FROM PROPERTY]`        | `SELECT ... [FROM TRACE]`           |
-| **Projection**      | Tags by family                        | Tags & Fields  
                       | Implicit (entity, value)            | Flat list of 
tags                   | Tags by specification               |
-| **Aggregation**     | No                                    | Yes (`SUM`, 
`MEAN`, etc.)             | Yes (optional)                      | No            
                      | No                                  |
-| **Grouping**        | No                                    | Yes (`GROUP 
BY`)                      | No                                  | No            
                      | No                                  |
-| **Filtering**       | Full `WHERE` clause + `TIME` clause   | Full `WHERE` 
clause + `TIME` clause   | Simple equality `WHERE` + `TIME` clause | `WHERE` by 
ID or tags               | Full `WHERE` clause + `TIME` clause |
-| **Ordering**        | Yes (`ORDER BY`)                      | Yes (`ORDER 
BY`)                      | Yes (`ORDER BY value`)              | No            
                      | Yes (`ORDER BY`)                    |
-| **Pagination**      | Yes (`LIMIT`/`OFFSET`)                | Yes 
(`LIMIT`/`OFFSET`)                | No                                  | 
`LIMIT` only                        | Yes (`LIMIT`/`OFFSET`)              |
+| Feature             | Streams                                         | 
Measures                                        | Top-N                         
                  | Properties                                      | Traces    
                                      |
+|:--------------------|:------------------------------------------------|:------------------------------------------------|:------------------------------------------------|:------------------------------------------------|:------------------------------------------------|
+| **Primary Command** | `SELECT ... FROM STREAM ... IN ...`             | 
`SELECT ... FROM MEASURE ... IN ...`            | `SHOW TOP ... FROM MEASURE 
... IN ...`          | `SELECT ... FROM PROPERTY ... IN ...`           | 
`SELECT ... FROM TRACE ... IN ...`              |
+| **Groups Clause**   | **Required** `IN groups` (parentheses optional) | 
**Required** `IN groups` (parentheses optional) | **Required** `IN groups` 
(parentheses optional) | **Required** `IN groups` (parentheses optional) | 
**Required** `IN groups` (parentheses optional) |
+| **Time Clause**     | **Required** `TIME ...`                         | 
**Required** `TIME ...`                         | **Required** `TIME ...`       
                  | Not applicable                                  | 
**Required** `TIME ...`                         |
+| **Projection**      | Tags by family                                  | Tags 
& Fields                                   | Implicit (entity, value)           
             | Flat list of tags                               | Tags by 
specification                           |
+| **Aggregation**     | No                                              | Yes 
(`SUM`, `MEAN`, etc.)                       | Yes (optional)                    
              | No                                              | No            
                                  |
+| **Grouping**        | No                                              | Yes 
(`GROUP BY`)                                | No                                
              | No                                              | No            
                                  |
+| **Filtering**       | Full `WHERE` clause                             | Full 
`WHERE` clause                             | Simple equality `WHERE`            
             | `WHERE` by ID or tags                           | Full `WHERE` 
clause                             |
+| **Ordering**        | Yes (`ORDER BY`)                                | Yes 
(`ORDER BY`)                                | Yes (`ORDER BY value`)            
              | No                                              | Yes (`ORDER 
BY`)                                |
+| **Pagination**      | Yes (`LIMIT`/`OFFSET`)                          | Yes 
(`LIMIT`/`OFFSET`)                          | No                                
              | `LIMIT` only                                    | Yes 
(`LIMIT`/`OFFSET`)                          |


Reply via email to