This is an automated email from the ASF dual-hosted git repository.

airborne pushed a commit to branch feature/search-lucene-mode-docs
in repository https://gitbox.apache.org/repos/asf/doris-website.git

commit d1ca8dec2993a43cf84ef187e36bfcd3dd614108
Author: airborne12 <[email protected]>
AuthorDate: Sun Jan 11 17:44:53 2026 +0800

    docs: Add Lucene boolean mode and escape characters documentation for 
search function
    
    Add documentation for two new features in the SEARCH function:
    
    1. Lucene Boolean Mode:
       - JSON-based options parameter (mode, minimum_should_match)
       - Left-to-right modifier parsing (MUST/SHOULD/MUST_NOT)
       - Behavior comparison table with standard mode
    
    2. Escape Characters:
       - Support for escaping special characters in DSL
       - Backslash escapes for space, parentheses, colon, backslash
    
    Updated both English and Chinese versions of search-function.md.
    
    Related PR: apache/doris#59394
    
    Co-Authored-By: Claude Opus 4.5 <[email protected]>
---
 docs/ai/text-search/search-function.md             | 81 ++++++++++++++++++++++
 .../current/ai/text-search/search-function.md      | 81 ++++++++++++++++++++++
 2 files changed, 162 insertions(+)

diff --git a/docs/ai/text-search/search-function.md 
b/docs/ai/text-search/search-function.md
index 51fc680fcf5..468c96fd6c7 100644
--- a/docs/ai/text-search/search-function.md
+++ b/docs/ai/text-search/search-function.md
@@ -35,6 +35,29 @@ Usage
 
 When `default_field` is provided, Doris expands bare terms or functions to 
that field. For example, `SEARCH('foo bar', 'tags', 'and')` behaves like 
`SEARCH('tags:ALL(foo bar)')`, while `SEARCH('foo bark', 'tags')` expands to 
`tags:ANY(foo bark)`. Explicit boolean operators inside the DSL always take 
precedence over the default operator.
 
+### Options Parameter (JSON format)
+
+The second parameter can also be a JSON string for advanced configuration:
+
+```sql
+SEARCH('<search_expression>', '<options_json>')
+```
+
+**Supported options:**
+
+| Option | Type | Description |
+|--------|------|-------------|
+| `default_field` | string | Column name for terms without explicit field |
+| `default_operator` | string | `and` or `or` for multi-term expressions |
+| `mode` | string | `standard` (default) or `lucene` |
+| `minimum_should_match` | integer | Minimum SHOULD clauses to match (Lucene 
mode only) |
+
+**Example:**
+```sql
+SELECT * FROM docs WHERE search('apple banana',
+  '{"default_field":"title","default_operator":"and","mode":"lucene"}');
+```
+
 `SEARCH()` follows SQL three-valued logic. Rows where all referenced fields 
are NULL evaluate to UNKNOWN (filtered out in the `WHERE` clause) unless other 
predicates short-circuit the expression (`TRUE OR NULL = TRUE`, `FALSE OR NULL 
= NULL`, `NOT NULL = NULL`), matching the behavior of dedicated text search 
operators.
 
 ### Current Supported Queries
@@ -100,6 +123,39 @@ SELECT id, title FROM search_test_basic
 WHERE SEARCH('tags:ANY(python javascript) AND (category:Technology OR 
category:Programming)');
 ```
 
+#### Lucene Boolean Mode
+
+Lucene mode mimics Elasticsearch/Lucene query_string behavior where boolean 
operators work as left-to-right modifiers instead of traditional boolean 
algebra.
+
+**Key differences from standard mode:**
+- AND/OR/NOT are modifiers that affect adjacent terms
+- Operator precedence is left-to-right
+- Uses MUST/SHOULD/MUST_NOT internally (like Lucene's Occur enum)
+- Pure NOT queries return empty results (need a positive clause)
+
+**Enable Lucene mode:**
+```sql
+-- Basic Lucene mode
+SELECT * FROM docs WHERE search('apple AND banana',
+  '{"default_field":"title","mode":"lucene"}');
+
+-- With minimum_should_match
+SELECT * FROM docs WHERE search('apple AND banana OR cherry',
+  '{"default_field":"title","mode":"lucene","minimum_should_match":1}');
+```
+
+**Behavior comparison:**
+
+| Query | Standard Mode | Lucene Mode |
+|-------|--------------|-------------|
+| `a AND b` | a ∩ b | +a +b (both MUST) |
+| `a OR b` | a ∪ b | a b (both SHOULD, min=1) |
+| `NOT a` | ¬a | Empty (no positive clause) |
+| `a AND NOT b` | a ∩ ¬b | +a -b (MUST a, MUST_NOT b) |
+| `a AND b OR c` | (a ∩ b) ∪ c | +a b c (only a is MUST) |
+
+**Note:** In Lucene mode, `a AND b OR c` parses left-to-right: the OR operator 
changes `b` from MUST to SHOULD. Use `minimum_should_match` to require SHOULD 
matches.
+
 #### Phrase query
 - Syntax: `column:"quoted phrase"`
 - Semantics: matches contiguous tokens in order using the column's analyzer; 
quotes must wrap the entire phrase.
@@ -253,6 +309,31 @@ WHERE SEARCH('properties.message:hello OR 
properties.category:beta')
 ORDER BY id;
 ```
 
+#### Escape Characters
+
+Use backslash (`\`) to escape special characters in DSL:
+
+| Escape | Description | Example |
+|--------|-------------|---------|
+| `\ ` | Literal space (joins terms) | `title:First\ Value` matches "First 
Value" |
+| `\(` `\)` | Literal parentheses | `title:hello\(world\)` matches 
"hello(world)" |
+| `\:` | Literal colon | `title:key\:value` matches "key:value" |
+| `\\` | Literal backslash | `title:path\\to\\file` matches "path\to\file" |
+
+**Example:**
+```sql
+-- Search for value containing space as single term
+SELECT * FROM docs WHERE search('title:First\\ Value');
+
+-- Search for value with parentheses
+SELECT * FROM docs WHERE search('title:hello\\(world\\)');
+
+-- Search for value with colon
+SELECT * FROM docs WHERE search('title:key\\:value');
+```
+
+**Note:** In SQL strings, backslashes need double escaping. Use `\\` in SQL to 
produce a single `\` in the DSL.
+
 ### Current Limitations
 
 - Range and list clauses (`field:[a TO b]`, `field:IN(...)`) still degrade to 
term lookups; rely on regular SQL predicates for numeric/date ranges or 
explicit `IN` filters.
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/text-search/search-function.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/text-search/search-function.md
index 11a45ea7256..622163252b5 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/text-search/search-function.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/text-search/search-function.md
@@ -35,6 +35,29 @@ SEARCH('<search_expression>', '<default_field>', 
'<default_operator>')
 
 提供 `default_field` 后,Doris 会把裸词项或函数自动扩展到该字段。例如 `SEARCH('foo bar', 'tags', 
'and')` 等价于 `SEARCH('tags:ALL(foo bar)')`,而 `SEARCH('foo bark', 'tags')` 会展开为 
`tags:ANY(foo bark)`。DSL 中显式出现的布尔操作优先级最高,会覆盖默认运算符。
 
+### Options 参数(JSON 格式)
+
+第二个参数也可以是 JSON 字符串,用于高级配置:
+
+```sql
+SEARCH('<search_expression>', '<options_json>')
+```
+
+**支持的选项:**
+
+| 选项 | 类型 | 说明 |
+|------|------|------|
+| `default_field` | string | 未指定字段的词项使用的默认列名 |
+| `default_operator` | string | 多词项表达式的默认运算符(`and` 或 `or`) |
+| `mode` | string | `standard`(默认)或 `lucene` |
+| `minimum_should_match` | integer | SHOULD 子句最小匹配数(仅 Lucene 模式) |
+
+**示例:**
+```sql
+SELECT * FROM docs WHERE search('apple banana',
+  '{"default_field":"title","default_operator":"and","mode":"lucene"}');
+```
+
 `SEARCH()` 遵循 SQL 三值逻辑。当所有参与匹配的列值均为 NULL 时结果为 UNKNOWN(在 `WHERE` 
中被过滤),但若与其他子表达式组合,可按布尔短路原则返回 TRUE 或继续保留 NULL(例如 `TRUE OR NULL = TRUE`、`FALSE OR 
NULL = NULL`、`NOT NULL = NULL`),行为与文本检索算子保持一致。
 
 ### 当前支持语法
@@ -100,6 +123,39 @@ SELECT id, title FROM search_test_basic
 WHERE SEARCH('tags:ANY(python javascript) AND (category:Technology OR 
category:Programming)');
 ```
 
+#### Lucene 布尔模式
+
+Lucene 模式模拟 Elasticsearch/Lucene 的 query_string 行为,布尔操作符作为左到右的修饰符工作,而非传统的布尔代数。
+
+**与标准模式的主要区别:**
+- AND/OR/NOT 是影响相邻词项的修饰符
+- 操作符优先级从左到右
+- 内部使用 MUST/SHOULD/MUST_NOT(类似 Lucene 的 Occur 枚举)
+- 纯 NOT 查询返回空结果(需要正向子句)
+
+**启用 Lucene 模式:**
+```sql
+-- 基本 Lucene 模式
+SELECT * FROM docs WHERE search('apple AND banana',
+  '{"default_field":"title","mode":"lucene"}');
+
+-- 使用 minimum_should_match
+SELECT * FROM docs WHERE search('apple AND banana OR cherry',
+  '{"default_field":"title","mode":"lucene","minimum_should_match":1}');
+```
+
+**行为对比:**
+
+| 查询 | 标准模式 | Lucene 模式 |
+|------|----------|-------------|
+| `a AND b` | a ∩ b | +a +b(都是 MUST) |
+| `a OR b` | a ∪ b | a b(都是 SHOULD,min=1) |
+| `NOT a` | ¬a | 空结果(无正向子句) |
+| `a AND NOT b` | a ∩ ¬b | +a -b(MUST a,MUST_NOT b) |
+| `a AND b OR c` | (a ∩ b) ∪ c | +a b c(只有 a 是 MUST) |
+
+**注意:** 在 Lucene 模式中,`a AND b OR c` 从左到右解析:OR 操作符将 `b` 从 MUST 改为 SHOULD。使用 
`minimum_should_match` 来要求 SHOULD 子句匹配。
+
 #### 词组查询
 - 语法:`column:"quoted phrase"`
 - 语义:根据列的分析器匹配连续且有序的词项,需使用双引号包裹完整短语。
@@ -253,6 +309,31 @@ WHERE SEARCH('properties.message:hello OR 
properties.category:beta')
 ORDER BY id;
 ```
 
+#### 转义字符
+
+使用反斜杠(`\`)转义 DSL 中的特殊字符:
+
+| 转义 | 说明 | 示例 |
+|------|------|------|
+| `\ ` | 字面空格(连接词项) | `title:First\ Value` 匹配 "First Value" |
+| `\(` `\)` | 字面括号 | `title:hello\(world\)` 匹配 "hello(world)" |
+| `\:` | 字面冒号 | `title:key\:value` 匹配 "key:value" |
+| `\\` | 字面反斜杠 | `title:path\\to\\file` 匹配 "path\to\file" |
+
+**示例:**
+```sql
+-- 搜索包含空格的值作为单个词项
+SELECT * FROM docs WHERE search('title:First\\ Value');
+
+-- 搜索包含括号的值
+SELECT * FROM docs WHERE search('title:hello\\(world\\)');
+
+-- 搜索包含冒号的值
+SELECT * FROM docs WHERE search('title:key\\:value');
+```
+
+**注意:** 在 SQL 字符串中,反斜杠需要双重转义。使用 `\\` 在 SQL 中产生 DSL 中的单个 `\`。
+
 ### 当前限制
 
 - 范围与列表子句(如 `field:[a TO b]`、`field:IN(...)`)仍会降级为普通词项匹配,建议使用常规 SQL 范围/`IN` 过滤。


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to