(doris-website) branch master updated: [update](inverted index) update inverted index docs structure and add new content (#2948)

airborne Wed, 22 Oct 2025 05:19:24 -0700

This is an automated email from the ASF dual-hosted git repository.

airborne pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 87a90886734 [update](inverted index) update inverted index docs 
structure and add new content (#2948)
87a90886734 is described below

commit 87a9088673424027924763860d5f60904b2c2b5a
Author: Jack <[email protected]>
AuthorDate: Wed Oct 22 20:17:02 2025 +0800

    [update](inverted index) update inverted index docs structure and add new 
content (#2948)
    
    ## Versions
    
    - [x] dev
    - [ ] 3.0
    - [ ] 2.1
    - [ ] 2.0
    
    ## Languages
    
    - [x] Chinese
    - [x] English
    
    ## Docs Checklist
    
    - [ ] Checked by AI
    - [ ] Test Cases Built
---
 docs/ai/search/text-search.md                      | 270 +++++++
 docs/ai/{ => search}/vector-search.md              |   0
 .../index/inverted-index/custom-analyzer.md        | 183 +++++
 docs/table-design/index/inverted-index/overview.md | 752 +++++++++++++++++++
 docs/table-design/index/inverted-index/scoring.md  | 107 +++
 .../index/inverted-index/search-function.md        | 220 ++++++
 .../index/inverted-index/search-operators.md       |  79 ++
 .../docusaurus-plugin-content-docs/current.json    |   4 +
 .../current/ai/search/text-search.md               | 236 ++++++
 .../current/ai/{ => search}/vector-search.md       |   0
 .../index/inverted-index/custom-analyzer.md        | 281 +++++++
 .../table-design/index/inverted-index/overview.md  | 808 +++++++++++++++++++++
 .../table-design/index/inverted-index/scoring.md   | 106 +++
 .../index/inverted-index/search-function.md        | 220 ++++++
 .../index/inverted-index/search-operators.md       |  79 ++
 sidebars.json                                      |  21 +-
 static/images/analyzer.png                         | Bin 0 -> 57808 bytes
 17 files changed, 3364 insertions(+), 2 deletions(-)

diff --git a/docs/ai/search/text-search.md b/docs/ai/search/text-search.md
new file mode 100644
index 00000000000..8ff7f43df14
--- /dev/null
+++ b/docs/ai/search/text-search.md
@@ -0,0 +1,270 @@
+---
+{
+    "title": "Text Search",
+    "language": "en"
+}
+---
+
+## Overview
+
+Text search retrieves documents that contain specific terms or phrases and 
ranks results by relevance.
+
+Compared with vector search, which excels at “finding broadly” by expanding 
recall through semantic similarity, text search excels at “finding precisely” 
by providing controllable, explainable exact matches that ensure keyword hits 
and deterministic filters.
+
+In generative AI applications—especially Retrieval‑Augmented Generation 
(RAG)—text and vector search complement each other. Working together, they 
balance semantic breadth and lexical precision, improve recall while ensuring 
accuracy and interpretability, and build a reliable retrieval foundation that 
provides models with more accurate, relevant context.
+
+## Evolution of Doris Text Search
+
+Since version 2.0.0, Doris has introduced and continuously expanded text 
search to meet diverse retrieval scenarios and growing query complexity.
+
+### Foundation (2.0+)
+Column‑level inverted indexes with basic full‑text operators (MATCH_ANY, 
MATCH_ALL) and multi‑language tokenizers enable efficient keyword search on 
large datasets.
+
+### Feature Expansion (2.x → 3.x)
+An enriched operator set adds phrase matching (MATCH_PHRASE), prefix search 
(MATCH_PHRASE_PREFIX), and regex matching (MATCH_REGEXP). Version 3.1 
introduces custom analyzers to address varied text analysis needs.
+
+### Capability Enhancements (4.0+)
+Text search gains relevance scoring and a unified search entry, formally 
introducing BM25 scoring and the SEARCH function.
+
+- BM25 relevance scoring: rank results by text relevance with `score()`, and 
blend with vector similarity for hybrid ranking.
+
+- SEARCH function: unified query DSL supporting cross‑column search and 
boolean logic, simplifying complex query construction while improving 
performance.
+
+## Core Text Search Features
+
+### Rich Text Operators
+
+Doris provides a set of full‑text operators covering multiple retrieval 
patterns—from keyword matching to advanced phrase queries.
+
+Major operators include:
+
+- `MATCH_ANY` / `MATCH_ALL`: OR/AND multi‑term matching for general keyword 
search
+- `MATCH_PHRASE`: Exact phrase with configurable slop and order control
+- `MATCH_PHRASE_PREFIX`: Phrase prefix matching for autocomplete and 
incremental search
+- `MATCH_REGEXP`: Regex on tokenized terms for pattern‑based search
+
+Operators can be used standalone or composed via `SEARCH()` to build complex 
logic. For example:
+
+```sql
+-- Keyword search (any keyword match)
+SELECT * FROM docs WHERE content MATCH_ANY 'apache doris database';
+
+-- Require all keywords
+SELECT * FROM docs WHERE content MATCH_ALL 'real-time analytics OLAP';
+
+-- Exact phrase
+SELECT * FROM docs WHERE content MATCH_PHRASE 'inverted index';
+
+-- Phrase with slop (allow up to 2 words between terms)
+SELECT * FROM docs WHERE content MATCH_PHRASE 'machine learning ~2';
+
+-- Prefix matching
+SELECT * FROM docs WHERE content MATCH_PHRASE_PREFIX 'data ware';  -- matches 
"data warehouse", "data warehousing"
+```
+
+[See all operators 
→](../../table-design/index/inverted-index/search-operators.md)
+
+### Custom Analyzers (3.1+)
+
+Tokenization strategy directly affects both precision and recall. Since 3.1, 
Doris supports custom analyzers so you can define the analysis pipeline by 
combining `char_filter`, `tokenizer`, and `token_filter`.
+
+Typical usage includes:
+
+- Custom character filtering for replacement/normalization before tokenization
+- Choosing tokenizers such as `standard`, `ngram`, `edge_ngram`, `keyword`, 
`icu` for different languages and text shapes
+- Applying token filters like `lowercase`, `word_delimiter`, `ascii_folding` 
to normalize and refine tokens
+
+```sql
+-- Define a custom analyzer
+CREATE INVERTED INDEX ANALYZER IF NOT EXISTS keyword_lowercase
+PROPERTIES (
+  "tokenizer" = "keyword",
+  "token_filter" = "asciifolding, lowercase"
+);
+
+-- Use the analyzer in table creation
+CREATE TABLE docs (
+  id BIGINT,
+  content TEXT,
+  INDEX idx_content (content) USING INVERTED PROPERTIES (
+    "analyzer" = "keyword_lowercase",
+    "support_phrase" = "true"
+  )
+);
+```
+
+[Learn about custom analyzers 
→](../../table-design/index/inverted-index/custom-analyzer.md)
+
+### BM25 Relevance Scoring (4.0+)
+
+Doris implements the **BM25 (Best Matching 25)** algorithm for text relevance 
scoring, enabling Top-N ranking of search results:
+
+**Key Features:**
+- Probabilistic ranking based on term frequency, inverse document frequency, 
and document length
+- Robust handling of both long and short documents
+- Tunable parameters (k1, b) for ranking behavior
+- Seamless integration with Top-N queries
+
+**Usage Pattern:**
+```sql
+SELECT id, title, score() AS relevance
+FROM docs
+WHERE content MATCH_ANY 'real-time OLAP analytics'
+ORDER BY relevance DESC
+LIMIT 10;
+```
+
+**How It Works:**
+- `score()` computes BM25 score for each matched row
+- Higher scores indicate stronger relevance to query terms
+- Combine with `ORDER BY` and `LIMIT` for efficient Top-N retrieval
+- Works with all `MATCH_*` operators and `SEARCH()` function
+
+[Learn more about scoring 
→](../../table-design/index/inverted-index/scoring.md)
+
+### SEARCH Function: Unified Query DSL (4.0+)
+
+The `SEARCH()` function provides a concise, expressive syntax for complex text 
queries:
+
+**Basic Syntax:**
+```sql
+SEARCH('column:term')                          -- Single term
+SEARCH('column:ANY(term1 term2)')              -- Any of the terms (OR)
+SEARCH('column:ALL(term1 term2)')              -- All terms (AND)
+SEARCH('column:EXACT(exact text)')             -- Case-sensitive exact match
+```
+
+**Boolean Composition:**
+```sql
+SEARCH('title:apache AND category:database')
+SEARCH('title:doris OR title:clickhouse')
+SEARCH('tags:ANY(olap analytics) AND NOT status:deprecated')
+```
+
+**Multi-Column Queries:**
+```sql
+SEARCH('title:search AND (content:engine OR tags:ANY(elasticsearch lucene))')
+```
+
+**Semi-Structured Data:**
+```sql
+SEARCH('properties.user.name:alice')           -- Variant subcolumn access
+```
+
+**With Scoring:**
+```sql
+SELECT id, title, score() AS relevance
+FROM docs
+WHERE SEARCH('title:Machine AND tags:ANY(database sql)')
+ORDER BY relevance DESC
+LIMIT 20;
+```
+
+[Complete SEARCH function guide 
→](../../table-design/index/inverted-index/search-function.md)
+
+## Quick Start
+
+### Step 1: Create Table with Inverted Index
+
+```sql
+CREATE TABLE docs (
+  id BIGINT,
+  title STRING,
+  content STRING,
+  category STRING,
+  tags ARRAY<STRING>,
+  created_at DATETIME,
+  -- Text search indexes
+  INDEX idx_title(title) USING INVERTED PROPERTIES ("parser" = "english"),
+  INDEX idx_content(content) USING INVERTED PROPERTIES ("parser" = "english", 
"support_phrase" = "true"),
+  INDEX idx_category(category) USING INVERTED,
+  INDEX idx_tags(tags) USING INVERTED
+)
+DUPLICATE KEY(id)
+DISTRIBUTED BY HASH(id) BUCKETS 10;
+```
+
+### Step 2: Run Text Queries
+
+```sql
+-- Simple keyword search
+SELECT * FROM docs WHERE content MATCH_ANY 'apache doris';
+
+-- Phrase search
+SELECT * FROM docs WHERE content MATCH_PHRASE 'full text search';
+
+-- Boolean query with SEARCH
+SELECT * FROM docs
+WHERE SEARCH('title:apache AND (category:database OR tags:ANY(sql nosql))');
+
+-- Relevance-based ranking
+SELECT id, title, score() AS relevance
+FROM docs
+WHERE content MATCH_ANY 'real-time analytics OLAP'
+ORDER BY relevance DESC
+LIMIT 10;
+```
+
+## Hybrid Search: Text + Vector
+
+Combine text search with vector similarity for comprehensive retrieval in RAG 
applications:
+
+```sql
+-- Hybrid retrieval: semantic similarity + keyword filtering
+SELECT id, title, score() AS text_relevance
+FROM docs
+WHERE
+  -- Vector filter for semantic similarity
+  cosine_distance(embedding, [0.1, 0.2, ...]) < 0.3
+  -- Text filter for keyword constraints
+  AND SEARCH('title:search AND content:engine AND category:technology')
+ORDER BY text_relevance DESC
+LIMIT 10;
+```
+
+## Managing Inverted Indexes
+
+### Creating Indexes
+
+```sql
+-- At table creation
+CREATE TABLE t (
+  content STRING,
+  INDEX idx(content) USING INVERTED PROPERTIES ("parser" = "english")
+);
+
+-- On existing table
+CREATE INDEX idx_content ON docs(content) USING INVERTED PROPERTIES ("parser" 
= "chinese");
+
+-- Build index for existing data
+BUILD INDEX idx_content ON docs;
+```
+
+### Deleting Indexes
+
+```sql
+DROP INDEX idx_content ON docs;
+```
+
+### Viewing Indexes
+
+```sql
+SHOW CREATE TABLE docs;
+SHOW INDEX FROM docs;
+```
+
+[Index management guide →](../../table-design/index/inverted-index/overview.md)
+
+## Learn More
+
+### Core Documentation
+
+- [Inverted Index 
Overview](../../table-design/index/inverted-index/overview.md) — Architecture, 
indexing principles, and management
+- [Text Search 
Operators](../../table-design/index/inverted-index/search-operators.md) — 
Complete operator reference and query acceleration
+- [SEARCH 
Function](../../table-design/index/inverted-index/search-function.md) — Unified 
query DSL syntax and examples
+- [BM25 Scoring](../../table-design/index/inverted-index/scoring.md) — 
Relevance ranking algorithm and usage
+
+### Advanced Topics
+
+- [Custom 
Analyzers](../../table-design/index/inverted-index/custom-analyzer.md) — Build 
domain-specific tokenizers and filters
+- [Vector Search](./vector-search.md) — Semantic similarity search with 
embeddings
diff --git a/docs/ai/vector-search.md b/docs/ai/search/vector-search.md
similarity index 100%
rename from docs/ai/vector-search.md
rename to docs/ai/search/vector-search.md
diff --git a/docs/table-design/index/inverted-index/custom-analyzer.md 
b/docs/table-design/index/inverted-index/custom-analyzer.md
new file mode 100644
index 00000000000..745ae8b697f
--- /dev/null
+++ b/docs/table-design/index/inverted-index/custom-analyzer.md
@@ -0,0 +1,183 @@
+---
+{
+    "title": "Custom Analyzer",
+    "language": "en"
+}
+---
+
+## Overview
+
+Custom analyzers allow you to overcome the limitations of built-in tokenizers 
by combining character filters, tokenizers, and token filters according to 
specific needs. This fine-tunes how text is segmented into searchable terms, 
directly determining search relevance and data analysis accuracy—a foundational 
key to enhancing search experience and data value.
+
+![Custom Analyzer Overview](/images/analyzer.png)
+
+## Using Custom Analyzers
+
+### Creating Components
+
+#### 1. Creating a char_filter
+
+```sql
+CREATE INVERTED INDEX CHAR_FILTER IF NOT EXISTS x_char_filter
+PROPERTIES (
+  "type" = "char_replace"
+  -- configure pattern/replacement parameters as needed
+);
+```
+
+`char_replace` replaces specified characters before tokenization.
+- Parameters
+  - `char_filter_pattern`: characters to replace
+  - `char_filter_replacement`: replacement characters (default: space)
+
+#### 2. Creating a tokenizer
+
+```sql
+CREATE INVERTED INDEX TOKENIZER IF NOT EXISTS x_tokenizer
+PROPERTIES (
+  "type" = "standard"
+);
+```
+
+Available tokenizers:
+- **standard**: Grammar-based tokenization following Unicode text segmentation
+- **ngram**: Generates N-grams of specified length
+- **edge_ngram**: Generates N-grams anchored at word start
+- **keyword**: No-op tokenizer that outputs entire input as single term
+- **char_group**: Tokenizes on specified characters
+- **basic**: Simple English, numbers, Chinese, Unicode tokenizer
+- **icu**: International text segmentation supporting all languages
+
+#### 3. Creating a token_filter
+
+```sql
+CREATE INVERTED INDEX TOKEN_FILTER IF NOT EXISTS x_token_filter
+PROPERTIES (
+  "type" = "word_delimiter"
+);
+```
+
+Available token filters:
+- **word_delimiter**: Splits tokens at non-alphanumeric characters
+- **ascii_folding**: Converts non-ASCII characters to ASCII equivalents
+- **lowercase**: Converts tokens to lowercase
+
+#### 4. Creating an analyzer
+
+```sql
+CREATE INVERTED INDEX ANALYZER IF NOT EXISTS x_analyzer
+PROPERTIES (
+  "tokenizer" = "x_tokenizer",            -- single tokenizer
+  "token_filter" = "x_filter1, x_filter2" -- one or more token_filters, in 
order
+);
+```
+
+### Viewing Components
+
+```sql
+SHOW INVERTED INDEX TOKENIZER;
+SHOW INVERTED INDEX TOKEN_FILTER;
+SHOW INVERTED INDEX ANALYZER;
+```
+
+### Deleting Components
+
+```sql
+DROP INVERTED INDEX TOKENIZER IF EXISTS x_tokenizer;
+DROP INVERTED INDEX TOKEN_FILTER IF EXISTS x_token_filter;
+DROP INVERTED INDEX ANALYZER IF EXISTS x_analyzer;
+```
+
+## Using Custom Analyzers in Table Creation
+
+Custom analyzers are specified using the `analyzer` parameter in index 
properties:
+
+```sql
+CREATE TABLE tbl (
+    `a` bigint NOT NULL AUTO_INCREMENT(1),
+    `ch` text NULL,
+    INDEX idx_ch (`ch`) USING INVERTED PROPERTIES("analyzer" = 
"x_custom_analyzer", "support_phrase" = "true")
+)
+table_properties;
+```
+
+## Usage Limitations
+
+1. The `type` and parameters in tokenizer and token_filter must be from the 
supported list, otherwise table creation will fail
+2. An analyzer can only be deleted when no tables are using it
+3. Tokenizers and token_filters can only be deleted when no analyzers are 
using them
+4. After creating custom analyzer syntax, it takes 10 seconds to sync to BE 
before data loading works normally
+
+## Notes
+
+1. Nesting multiple components in a custom analyzer may degrade tokenization 
performance
+2. The `tokenize` function supports custom analyzers
+3. Predefined tokenization uses `built_in_analyzer`, custom tokenization uses 
`analyzer` - only one can exist
+
+## Complete Examples
+
+### Example 1: Phone Number Tokenization
+
+Using edge_ngram for phone number tokenization:
+
+```sql
+CREATE INVERTED INDEX TOKENIZER IF NOT EXISTS edge_ngram_phone_number_tokenizer
+PROPERTIES
+(
+    "type" = "edge_ngram",
+    "min_gram" = "3",
+    "max_gram" = "10",
+    "token_chars" = "digit"
+);
+
+CREATE INVERTED INDEX ANALYZER IF NOT EXISTS edge_ngram_phone_number
+PROPERTIES
+(
+    "tokenizer" = "edge_ngram_phone_number_tokenizer"
+);
+
+CREATE TABLE tbl (
+    `a` bigint NOT NULL AUTO_INCREMENT(1),
+    `ch` text NULL,
+    INDEX idx_ch (`ch`) USING INVERTED PROPERTIES("support_phrase" = "true", 
"analyzer" = "edge_ngram_phone_number")
+) ENGINE=OLAP
+DUPLICATE KEY(`a`)
+DISTRIBUTED BY RANDOM BUCKETS 1
+PROPERTIES (
+"replication_allocation" = "tag.location.default: 1"
+);
+```
+
+### Example 2: Fine-grained Tokenization
+
+Using standard + word_delimiter for detailed tokenization:
+
+```sql
+CREATE INVERTED INDEX TOKEN_FILTER IF NOT EXISTS word_splitter
+PROPERTIES
+(
+    "type" = "word_delimiter",
+    "split_on_numerics" = "false",
+    "split_on_case_change" = "false"
+);
+
+CREATE INVERTED INDEX ANALYZER IF NOT EXISTS lowercase_delimited
+PROPERTIES
+(
+    "tokenizer" = "standard",
+    "token_filter" = "asciifolding, word_splitter, lowercase"
+);
+```
+
+### Example 3: Keyword with Multiple Token Filters
+
+Using keyword to preserve original terms with multiple token filters:
+
+```sql
+CREATE INVERTED INDEX ANALYZER IF NOT EXISTS keyword_lowercase
+PROPERTIES
+(
+"tokenizer" = "keyword",
+"token_filter" = "asciifolding, lowercase"
+);
+```
diff --git a/docs/table-design/index/inverted-index/overview.md 
b/docs/table-design/index/inverted-index/overview.md
new file mode 100644
index 00000000000..c23404f9c3a
--- /dev/null
+++ b/docs/table-design/index/inverted-index/overview.md
@@ -0,0 +1,752 @@
+---
+{
+    "title": "Inverted Index",
+    "sidebar_label": "Overview",
+    "language": "en"
+}
+---
+
+## Indexing Principle
+
+[Inverted Index](https://en.wikipedia.org/wiki/Inverted_index) is a commonly 
used indexing technique in the field of information retrieval. It divides text 
into individual words and constructs a word -> document IDs index, allowing for 
quick searches to determine which documents contain a specific word.
+
+Starting from version 2.0.0, Doris supports inverted indexes, which can be 
used for full-text searches on text types, equality, and range queries on 
normal numerical and date types, and quickly filtering rows that meet the 
criteria from massive amounts of data.
+
+In Doris's implementation of the inverted index, each row in the table 
corresponds to a document, and each column corresponds to a field in the 
document. Thus, using an inverted index, you can quickly locate rows containing 
specific keywords, thereby accelerating the WHERE clause.
+
+Unlike other indexes in Doris, the inverted index uses independent files at 
the storage layer, corresponding one-to-one with data files but physically 
stored independently. This approach allows for creating and deleting indexes 
without rewriting data files, significantly reducing processing overhead.
+
+## Usage Scenarios
+
+Inverted indexes have a wide range of applications and can accelerate 
equality, range, and full-text searches (keyword matching, phrase matching, 
etc.). A table can have multiple inverted indexes, and the conditions of 
multiple inverted indexes can be combined arbitrarily during queries.
+
+The functionality of inverted indexes is briefly introduced as follows:
+
+**1. Accelerate full-text searches for string types**
+
+- Support for keyword search, including matching multiple keywords 
simultaneously `MATCH_ALL` and matching any one keyword `MATCH_ANY`.
+
+- Support for phrase queries `MATCH_PHRASE`
+  - Support for specifying slop for word distence
+  - Support for phrase + prefix `MATCH_PHRASE_PREFIX`
+
+- Support for tokenized regular expression queries `MATCH_REGEXP`
+
+- Support for English, Chinese, and Unicode tokenizers
+
+**2. Accelerate normal equality and range queries, covering and replacing the 
functionality of BITMAP index**
+
+- Support for fast filtering of string, numerical, and datetime types for =, 
!=, >, >=, <, <=
+
+- Support for fast filtering of string, numerical, and datetime array types 
for `array_contains`
+
+**3. Support for comprehensive logical combinations**
+
+- Not only supports acceleration for AND conditions but also for OR and NOT 
conditions
+
+- Supports arbitrary logical combinations of multiple conditions with AND, OR, 
NOT
+
+**4. Flexible and efficient index management**
+
+- Support for defining inverted indexes when creating a table
+
+- Support for adding inverted indexes to existing tables, with incremental 
index construction without rewriting existing data in the table
+
+- Support for deleting inverted indexes from existing tables without rewriting 
existing data in the table
+
+:::tip
+
+There are some limitations to using inverted indexes:
+
+1. Floating-point types FLOAT and DOUBLE, which have precision issues, do not 
support inverted indexes due to inaccurate precision. The solution is to use 
the precisely accurate DECIMAL type, which supports inverted indexes.
+
+2. Some complex data types do not yet support inverted indexes, including MAP, 
STRUCT, JSON, HLL, BITMAP, QUANTILE_STATE, AGG_STATE.
+
+3. DUPLICATE and UNIQUE table models with Merge-on-Write enabled support 
building inverted indexes on any column. However, AGGREGATE and UNIQUE models 
without Merge-on-Write enabled only support building inverted indexes on Key 
columns, as non-Key columns cannot have inverted indexes. This is because these 
two models require reading all data for merging, so indexes cannot be used for 
pre-filtering.
+
+:::
+
+## Managing Indexes
+
+### Defining Inverted Indexes When Creating a Table
+
+In the table creation statement, after the COLUMN definition, is the index 
definition:
+
+```sql
+CREATE TABLE table_name
+(
+  column_name1 TYPE1,
+  column_name2 TYPE2,
+  column_name3 TYPE3,
+  INDEX idx_name1(column_name1) USING INVERTED [PROPERTIES(...)] [COMMENT 
'your comment'],
+  INDEX idx_name2(column_name2) USING INVERTED [PROPERTIES(...)] [COMMENT 
'your comment']
+)
+table_properties;
+```
+
+Syntax explanation:
+
+**1. `idx_column_name(column_name)` is mandatory, `column_name` is the name of 
the column for the index, must be a column defined earlier, `idx_column_name` 
is the index name, must be unique at the table level, recommended naming 
convention: prefix `idx_` before the column name**
+
+**2. `USING INVERTED` is mandatory to specify that the index type is an 
inverted index**
+
+**3. `PROPERTIES` is optional to specify additional properties of the inverted 
index, currently supported properties are:**
+
+<details>
+  <summary>parser: specifies the tokenizer</summary>
+  <p>- By default, it is unspecified, meaning no tokenization</p>
+  <p>- `english`: English tokenization, suitable for columns with English 
text, uses spaces and punctuation for tokenization, high performance</p>
+  <p>- `chinese`: Chinese tokenization, suitable for columns with mainly 
Chinese text, lower performance than English tokenization</p>
+  <p>- `unicode`: Unicode tokenization, suitable for mixed Chinese and 
English, and mixed multilingual texts. It can tokenize email prefixes and 
suffixes, IP addresses, and mixed character and number strings, and can 
tokenize Chinese by characters.</p>
+  <p>- `icu` (Supported since 3.1.0): ICU (International Components for 
Unicode) tokenization, based on the ICU library. Ideal for internationalized 
text with complex writing systems and multilingual documents. Supports 
languages like Arabic, Thai, and other Unicode-based scripts.</p>
+  <p>- `basic` (Supported since 3.1.0): Basic rule-based tokenization using 
simple character type recognition. Suitable for scenarios with extremely high 
performance requirements or simple text processing needs. Rules: continuous 
alphanumeric characters are treated as one token, each Chinese character is a 
separate token, and punctuation/spaces/special characters are ignored. This 
tokenizer provides the best performance among all tokenizers but with simpler 
tokenization logic compared to [...]
+  <p>- `ik` (Supported since 3.1.0): IK Chinese tokenization, specifically 
designed for Chinese text analysis.</p>
+
+  Tokenization results can be verified using the `TOKENIZE` SQL function, see 
the following sections for details.
+</details>
+
+<details>
+  <summary>parser_mode</summary>
+
+  **Specifies the tokenization mode, currently supported modes for `parser = 
chinese` are:**
+  <p>- fine_grained: fine-grained mode, tends to generate shorter, more words, 
e.g., '武汉市长江大桥' will be tokenized into '武汉', '武汉市', '市长', '长江', '长江大桥', '大桥'</p>
+  <p>- coarse_grained: coarse-grained mode, tends to generate longer, fewer 
words, e.g., '武汉市长江大桥' will be tokenized into '武汉市', '长江大桥'</p>
+  <p>- default coarse_grained</p>
+</details>
+
+<details>
+  <summary>support_phrase</summary>
+
+  **Specifies whether the index supports MATCH_PHRASE phrase query 
acceleration**
+  <p>- true: supported, but the index requires more storage space</p>
+  <p>- false: not supported, more storage efficient, can use MATCH_ALL to 
query multiple keywords</p>
+  <p>- From versions 2.0.14, 2.1.5 and 3.0.1, the default is true if parser is 
set. Otherwise default to false.</p>
+
+  For example, the following example specifies Chinese tokenization, 
coarse-grained mode, and supports phrase query acceleration.
+```sql
+   INDEX idx_name(column_name) USING INVERTED PROPERTIES("parser" = "chinese", 
"parser_mode" = "coarse_grained", "support_phrase" = "true")
+```
+</details>
+
+<details>
+  <summary>char_filter</summary>
+
+  **Specifies preprocessing the text before tokenization, usually to affect 
tokenization behavior**
+
+  <p>char_filter_type: specifies different functional char_filters (currently 
only supports char_replace)</p>
+
+  <p>char_replace replaces each char in the pattern with a char in the 
replacement</p>
+  <p>- char_filter_pattern: characters to be replaced</p>
+  <p>- char_filter_replacement: replacement character array, optional, 
defaults to a space character</p>
+
+  For example, the following example replaces dots and underscores with 
spaces, thus treating them as word separators, affecting tokenization behavior.
+```sql
+   INDEX idx_name(column_name) USING INVERTED PROPERTIES("parser" = "unicode", 
"char_filter_type" = "char_replace", "char_filter_pattern" = "._", 
"char_filter_replacement" = " ")
+```
+`
+</details>
+
+<details>
+  <summary>ignore_above</summary>
+
+  **Specifies the length limit for non-tokenized string indexes (parser not 
specified)**
+  <p>- Strings longer than the length set by ignore_above will not be indexed. 
For string arrays, ignore_above applies to each array element separately, and 
elements longer than ignore_above will not be indexed.</p>
+  <p>- Default is 256, unit is bytes</p>
+
+</details>
+
+<details>
+  <summary>lower_case</summary>
+
+  **Whether to convert tokens to lowercase for case-insensitive matching**
+  <p>- true: convert to lowercase</p>
+  <p>- false: do not convert to lowercase</p>
+  <p>- From versions 2.0.7 and 2.1.2, the default is true, automatically 
converting to lowercase. Earlier versions default to false.</p>
+</details>
+
+<details>
+  <summary>stopwords</summary>
+
+  **Specifying the stopword list to use, which will affect the behavior of the 
tokenizer**
+  <p>- The default built-in stopword list includes meaningless words such as 
'is,' 'the,' 'a,' etc. When writing or querying, the tokenizer will ignore 
words that are in the stopword list.</p>
+  <p>- none: Use an empty stopword list</p>
+</details>
+
+<details>
+  <summary>dict_compression (Supported since 3.1.0)</summary>
+
+  **Specifies whether to enable ZSTD dictionary compression for the inverted 
index term dictionary**
+  <p>- true: Enable dictionary compression, which can reduce index storage 
size by up to 20%, especially effective for large-scale text data and log 
analysis scenarios</p>
+  <p>- false: Disable dictionary compression (default)</p>
+  <p>- Recommendation: Enable for scenarios with large text datasets, log 
analytics, or when storage cost is a concern. Works best with 
inverted_index_storage_format = "V3"</p>
+
+  For example:
+```sql
+   INDEX idx_name(column_name) USING INVERTED PROPERTIES("parser" = "english", 
"dict_compression" = "true")
+```
+</details>
+
+**4. `COMMENT` is optional for specifying index comments**
+
+**5. Table-level property `inverted_index_storage_format` (Supported since 
3.1.0)**
+
+  To use the new V3 storage format for inverted indexes, specify this property 
when creating the table:
+
+```sql
+CREATE TABLE table_name (
+    column_name TEXT,
+    INDEX idx_name(column_name) USING INVERTED PROPERTIES("parser" = 
"english", "dict_compression" = "true")
+) PROPERTIES (
+    "inverted_index_storage_format" = "V3"
+);
+```
+
+  **inverted_index_storage_format values:**
+  <p>- "V2": Default storage format</p>
+  <p>- "V3": New storage format with optimized compression. Compared to V2, V3 
provides:</p>
+  <p>  - Smaller index files, reducing disk usage and I/O overhead</p>
+  <p>  - Up to 20% storage space savings for large-scale text data and log 
analysis scenarios</p>
+  <p>  - ZSTD dictionary compression for term dictionaries (when 
dict_compression is enabled)</p>
+  <p>  - Compression for positional information associated with each term</p>
+  <p>- Recommendation: Use V3 for new tables with large text datasets, log 
analytics workloads, or when storage optimization is important</p>
+
+### Adding Inverted Indexes to Existing Tables
+
+**1. ADD INDEX**
+
+Supports both `CREATE INDEX` and `ALTER TABLE ADD INDEX` syntax. The 
parameters are the same as those used when defining indexes during table 
creation.
+
+```sql
+-- Syntax 1
+CREATE INDEX idx_name ON table_name(column_name) USING INVERTED 
[PROPERTIES(...)] [COMMENT 'your comment'];
+-- Syntax 2
+ALTER TABLE table_name ADD INDEX idx_name(column_name) USING INVERTED 
[PROPERTIES(...)] [COMMENT 'your comment'];
+```
+
+**2. BUILD INDEX**
+
+The `CREATE / ADD INDEX` operation only adds the index definition. New data 
written after this operation will generate inverted indexes, but existing data 
requires using `BUILD INDEX` to trigger indexing:
+
+```sql
+-- Syntax 1, by default, builds the index for all partitions in the table
+BUILD INDEX index_name ON table_name;
+-- Syntax 2, you can specify partitions, one or more
+BUILD INDEX index_name ON table_name PARTITIONS(partition_name1, 
partition_name2);
+```
+
+To check the progress of `BUILD INDEX`, use `SHOW BUILD INDEX`:
+
+```sql
+SHOW BUILD INDEX [FROM db_name];
+-- Example 1, view the progress of all BUILD INDEX tasks
+SHOW BUILD INDEX;
+-- Example 2, view the progress of BUILD INDEX tasks for a specific table
+SHOW BUILD INDEX where TableName = "table1";
+```
+
+To cancel `BUILD INDEX`, use `CANCEL BUILD INDEX`:
+
+```sql
+CANCEL BUILD INDEX ON table_name;
+CANCEL BUILD INDEX ON table_name (job_id1, job_id2, ...);
+```
+
+:::tip
+
+`BUILD INDEX` creates an asynchronous task executed by multiple threads on 
each BE. The number of threads can be set using the BE config 
`alter_index_worker_count`, with a default value of 3.
+
+In versions before 2.0.12 and 2.1.4, `BUILD INDEX` would keep retrying until 
it succeeded. Starting from these versions, failure and timeout mechanisms 
prevent endless retries. 3.0 (Cloud Mode) does not support this command as this 
moment.
+
+1. If the majority of replicas for a tablet fail to `BUILD INDEX`, the entire 
`BUILD INDEX` operation fails.
+2. If the time exceeds `alter_table_timeout_second`, the `BUILD INDEX` 
operation times out.
+3. Users can trigger `BUILD INDEX` multiple times; indexes that have already 
been built successfully will not be rebuilt.
+
+:::
+
+### Deleting Inverted Indexes from Existing Tables
+
+```sql
+-- Syntax 1
+DROP INDEX idx_name ON table_name;
+-- Syntax 2
+ALTER TABLE table_name DROP INDEX idx_name;
+```
+
+:::tip
+
+`DROP INDEX` deletes the index definition, so new data will no longer write to 
the index. This creates an asynchronous task to perform the index deletion, 
executed by multiple threads on each BE. The number of threads can be set using 
the BE parameter `alter_index_worker_count`, with a default value of 3.
+
+:::
+
+### Viewing Inverted Index
+
+-- Syntax 1: The INDEX section in the table schema with USING INVERTED 
indicates an inverted index
+SHOW CREATE TABLE table_name;
+
+-- Syntax 2: IndexType as INVERTED indicates an inverted index
+SHOW INDEX FROM idx_name;
+
+## Using Indexes
+
+### Accelerating Queries with Inverted Indexes
+
+```sql
+-- 1. Full-text search keyword matching using MATCH_ANY and MATCH_ALL
+SELECT * FROM table_name WHERE column_name MATCH_ANY | MATCH_ALL 'keyword1 
...';
+
+-- 1.1 Rows in the content column containing keyword1
+SELECT * FROM table_name WHERE content MATCH_ANY 'keyword1';
+
+-- 1.2 Rows in the content column containing keyword1 or keyword2; you can add 
more keywords
+SELECT * FROM table_name WHERE content MATCH_ANY 'keyword1 keyword2';
+
+-- 1.3 Rows in the content column containing both keyword1 and keyword2; you 
can add more keywords
+SELECT * FROM table_name WHERE content MATCH_ALL 'keyword1 keyword2';
+```
+
+```sql
+-- 2. Full-text search phrase matching using MATCH_PHRASE
+
+-- 2.1 Rows in the content column containing both keyword1 and keyword2, where 
keyword2 must immediately follow keyword1
+-- 'keyword1 keyword2', 'wordx keyword1 keyword2', 'wordx keyword1 keyword2 
wordy' all match because they contain 'keyword1 keyword2' with keyword2 
immediately following keyword1
+-- 'keyword1 wordx keyword2' does not match because there is a word between 
keyword1 and keyword2
+-- 'keyword2 keyword1' does not match because the order is reversed
+-- To use MATCH_PHRASE, you need to enable "support_phrase" = "true" in 
PROPERTIES.
+SELECT * FROM table_name WHERE content MATCH_PHRASE 'keyword1 keyword2';
+
+-- 2.2 Rows in the content column containing both keyword1 and keyword2, with 
a slop (maximum word distance) of 3
+-- 'keyword1 keyword2', 'keyword1 a keyword2', 'keyword1 a b c keyword2' all 
match because the slop is 0, 1, and 3 respectively, all within 3
+-- 'keyword1 a b c d keyword2' does not match because the slop is 4, exceeding 
3
+-- 'keyword2 keyword1', 'keyword2 a keyword1', 'keyword2 a b c keyword1' also 
match because when slop > 0, the order of keyword1 and keyword2 is not 
required. To enforce the order, Doris provides a + sign after slop
+SELECT * FROM table_name WHERE content MATCH_PHRASE 'keyword1 keyword2 ~3';
+-- To enforce order, use a positive sign with slop; 'keyword1 a b c keyword2' 
matches, while 'keyword2 a b c keyword1' does not
+SELECT * FROM table_name WHERE content MATCH_PHRASE 'keyword1 keyword2 ~3+';
+
+-- 2.3 Prefix matching the last word keyword2, with a default limit of 50 
prefixes (controlled by session variable inverted_index_max_expansions)
+-- It is necessary to ensure that keyword1 and keyword2 remain adjacent in the 
original text after tokenization, with no other words in between.
+-- 'keyword1 keyword2abc' matches because keyword1 is identical and 
keyword2abc is a prefix of keyword2
+-- 'keyword1 keyword2' also matches because keyword2 is a prefix of keyword2
+-- 'keyword1 keyword3' does not match because keyword3 is not a prefix of 
keyword2
+-- 'keyword1 keyword3abc' does not match because keyword3abc is not a prefix 
of keyword2
+SELECT * FROM table_name WHERE content MATCH_PHRASE_PREFIX 'keyword1 keyword2';
+
+-- 2.4 If only one word is provided, it defaults to a prefix query with a 
limit of 50 prefixes (controlled by session variable 
inverted_index_max_expansions)
+SELECT * FROM table_name WHERE content MATCH_PHRASE_PREFIX 'keyword1';
+
+-- 2.5 Regular expression matching on tokenized words, with a default limit of 
50 matches (controlled by session variable inverted_index_max_expansions)
+-- Similar to MATCH_PHRASE_PREFIX but with regex instead of prefix
+SELECT * FROM table_name WHERE content MATCH_REGEXP 'key.*';
+
+-- 3. Normal equality, range, IN, and NOT IN queries using standard SQL 
syntax, for example:
+SELECT * FROM table_name WHERE id = 123;
+SELECT * FROM table_name WHERE ts > '2023-01-01 00:00:00';
+SELECT * FROM table_name WHERE op_type IN ('add', 'delete');
+
+-- 4. Full-text search across multiple columns using the multi_match function
+-- Parameters:
+--   First N parameters are column names to search
+--   Second-to-last parameter specifies match mode: 
'any'/'all'/'phrase'/'phrase_prefix'
+--   Last parameter is the keyword or phrase to search for
+
+-- 4.1 Rows where 'keyword1' appears in ANY of col1,col2,col3 (OR logic)
+select * FROM table_name WHERE multi_match(col1, col2, col3, 'any', 
'keyword1');
+
+-- 4.2 Rows where 'keyword1' appears in ALL of col1,col2,col3 (AND logic)
+select * FROM table_name WHERE multi_match(col1, col2, col3, 'all', 
'keyword1');
+
+-- 4.3 Rows where the exact phrase 'keyword1' appears in ANY of col1,col2,col3 
(exact phrase match)
+select * FROM table_name WHERE multi_match(col1, col2, col3, 'phrase', 
'keyword1');
+
+-- 4.4 Rows where a phrase starting with 'keyword1' appears in ANY of 
col1,col2,col3 (phrase prefix match)
+-- For example, will match content like "keyword123"
+select * FROM table_name WHERE multi_match(col1, col2, col3, 'phrase_prefix', 
'keyword1');
+```
+
+### Analyzing Index Acceleration Effects Through Profiles
+
+Inverted query acceleration can be toggled using the session variable 
`enable_inverted_index_query`, which is set to true by default. To verify the 
acceleration effect of the index, it can be set to false to turn it off.
+
+The acceleration effect of the inverted index can be analyzed using the 
following metrics in the Query Profile:
+- RowsInvertedIndexFiltered: The number of rows filtered by the inverted 
index, which can be compared with other Rows values to analyze the filtering 
effect of the index.
+- InvertedIndexFilterTime: The time consumed by the inverted index.
+  - InvertedIndexSearcherOpenTime: The time taken to open the inverted index.
+  - InvertedIndexSearcherSearchTime: The time taken for internal queries 
within the inverted index.
+
+
+### Verifying Tokenization Effects Using Tokenization Functions
+
+To check the actual effect of tokenization or to tokenize a piece of text, you 
can use the `TOKENIZE` function for verification.
+
+The first parameter of the `TOKENIZE` function is the text to be tokenized, 
and the second parameter specifies the tokenization parameters used when 
creating the index.
+
+```sql
+-- English tokenization
+SELECT TOKENIZE('I love Doris','"parser"="english"');
++------------------------------------------------+
+| tokenize('I love Doris', '"parser"="english"') |
++------------------------------------------------+
+| ["i", "love", "doris"]                         |
++------------------------------------------------+
+
+-- ICU tokenization for multilingual text (Supported since 3.1.0)
+SELECT TOKENIZE('مرحبا بالعالم Hello 世界', '"parser"="icu"');
++--------------------------------------------------------+
+| tokenize('مرحبا بالعالم Hello 世界', '"parser"="icu"') |
++--------------------------------------------------------+
+| ["مرحبا", "بالعالم", "Hello", "世界"]                   |
++--------------------------------------------------------+
+
+SELECT TOKENIZE('มนไมเปนไปตามความตองการ', '"parser"="icu"');
++-------------------------------------------------------------------+
+| tokenize('มนไมเปนไปตามความตองการ', '"parser"="icu"')            |
++-------------------------------------------------------------------+
+| ["มน", "ไมเปน", "ไป", "ตาม", "ความ", "ตองการ"]                  |
++-------------------------------------------------------------------+
+
+-- Basic tokenization for high performance (Supported since 3.1.0)
+SELECT TOKENIZE('Hello World! This is a test.', '"parser"="basic"');
++-----------------------------------------------------------+
+| tokenize('Hello World! This is a test.', '"parser"="basic"') |
++-----------------------------------------------------------+
+| ["hello", "world", "this", "is", "a", "test"]              |
++-----------------------------------------------------------+
+
+SELECT TOKENIZE('你好世界', '"parser"="basic"');
++-------------------------------------------+
+| tokenize('你好世界', '"parser"="basic"')   |
++-------------------------------------------+
+| ["你", "好", "世", "界"]                    |
++-------------------------------------------+
+
+SELECT TOKENIZE('Hello你好World世界', '"parser"="basic"');
++------------------------------------------------------+
+| tokenize('Hello你好World世界', '"parser"="basic"')    |
++------------------------------------------------------+
+| ["hello", "你", "好", "world", "世", "界"]             |
++------------------------------------------------------+
+
+SELECT TOKENIZE('GET /images/hm_bg.jpg HTTP/1.0', '"parser"="basic"');
++---------------------------------------------------------------------+
+| tokenize('GET /images/hm_bg.jpg HTTP/1.0', '"parser"="basic"')      |
++---------------------------------------------------------------------+
+| ["get", "images", "hm", "bg", "jpg", "http", "1", "0"]              |
++---------------------------------------------------------------------+
+```
+
+## Usage Example
+
+Demonstrating the creation of an inverted index, full-text search, and regular 
queries using 1 million records from HackerNews. This includes a simple 
performance comparison with queries without indexing.
+
+### Table Creation
+
+```sql
+CREATE DATABASE test_inverted_index;
+
+USE test_inverted_index;
+
+-- Create a table with an inverted index on the comment field
+--   USING INVERTED specifies the index type as an inverted index
+--   PROPERTIES("parser" = "english") specifies using the "english" tokenizer; 
other options include "chinese" for Chinese tokenization and "unicode" for 
mixed-language tokenization. If the "parser" parameter is not specified, no 
tokenization is applied.
+
+CREATE TABLE hackernews_1m
+(
+    `id` BIGINT,
+    `deleted` TINYINT,
+    `type` String,
+    `author` String,
+    `timestamp` DateTimeV2,
+    `comment` String,
+    `dead` TINYINT,
+    `parent` BIGINT,
+    `poll` BIGINT,
+    `children` Array<BIGINT>,
+    `url` String,
+    `score` INT,
+    `title` String,
+    `parts` Array<INT>,
+    `descendants` INT,
+    INDEX idx_comment (`comment`) USING INVERTED PROPERTIES("parser" = 
"english") COMMENT 'inverted index for comment'
+)
+DUPLICATE KEY(`id`)
+DISTRIBUTED BY HASH(`id`) BUCKETS 10
+PROPERTIES ("replication_num" = "1");
+```
+
+### Data Import
+
+**Importing Data via Stream Load**
+
+```
+wget 
https://qa-build.oss-cn-beijing.aliyuncs.com/regression/index/hacknernews_1m.csv.gz
+
+curl --location-trusted -u root: -H "compress_type:gz" -T 
hacknernews_1m.csv.gz 
http://127.0.0.1:8030/api/test_inverted_index/hackernews_1m/_stream_load
+{
+    "TxnId": 2,
+    "Label": "a8a3e802-2329-49e8-912b-04c800a461a6",
+    "TwoPhaseCommit": "false",
+    "Status": "Success",
+    "Message": "OK",
+    "NumberTotalRows": 1000000,
+    "NumberLoadedRows": 1000000,
+    "NumberFilteredRows": 0,
+    "NumberUnselectedRows": 0,
+    "LoadBytes": 130618406,
+    "LoadTimeMs": 8988,
+    "BeginTxnTimeMs": 23,
+    "StreamLoadPutTimeMs": 113,
+    "ReadDataTimeMs": 4788,
+    "WriteDataTimeMs": 8811,
+    "CommitAndPublishTimeMs": 38
+}
+```
+
+**Confirm Data Import Success with SQL count()**
+
+```sql
+SELECT count() FROM hackernews_1m;
++---------+
+| count() |
++---------+
+| 1000000 |
++---------+
+```
+
+### Queries
+
+**01 Full-Text Search**
+
+- Using `LIKE` to match and count rows containing 'OLAP' in the `comment` 
column took 0.18s.
+
+  ```sql
+  SELECT count() FROM hackernews_1m WHERE comment LIKE '%OLAP%';
+  +---------+
+  | count() |
+  +---------+
+  |      34 |
+  +---------+
+  ```
+
+- Using full-text search with `MATCH_ANY` based on the inverted index to count 
rows containing 'OLAP' in the `comment` column took 0.02s, resulting in a 9x 
speedup. The performance improvement would be even more significant on larger 
datasets.
+  
+  The difference in the number of results is due to the inverted index 
normalizing the terms by converting them to lowercase, among other processes, 
hence `MATCH_ANY` yields more results than `LIKE`.
+
+  ```sql
+  SELECT count() FROM hackernews_1m WHERE comment MATCH_ANY 'OLAP';
+  +---------+
+  | count() |
+  +---------+
+  |      35 |
+  +---------+
+  ```
+
+- Similarly, comparing the performance for counting occurrences of 'OLTP', 
0.07s vs 0.01s. Due to caching, both `LIKE` and `MATCH_ANY` improved, but the 
inverted index still provided a 7x speedup.
+
+  ```sql
+  SELECT count() FROM hackernews_1m WHERE comment LIKE '%OLTP%';
+  +---------+
+  | count() |
+  +---------+
+  |      48 |
+  +---------+
+
+
+  SELECT count() FROM hackernews_1m WHERE comment MATCH_ANY 'OLTP';
+  +---------+
+  | count() |
+  +---------+
+  |      51 |
+  +---------+
+  ```
+
+- Counting rows where both 'OLAP' and 'OLTP' appear took 0.13s vs 0.01s, a 13x 
speedup.
+
+  To require multiple terms to appear simultaneously (AND relationship), use 
`MATCH_ALL 'keyword1 keyword2 ...'`.
+
+  ```sql
+  SELECT count() FROM hackernews_1m WHERE comment LIKE '%OLAP%' AND comment 
LIKE '%OLTP%';
+  +---------+
+  | count() |
+  +---------+
+  |      14 |
+  +---------+
+
+
+  SELECT count() FROM hackernews_1m WHERE comment MATCH_ALL 'OLAP OLTP';
+  +---------+
+  | count() |
+  +---------+
+  |      15 |
+  +---------+
+  ```
+
+- Counting rows where either 'OLAP' or 'OLTP' appears took 0.12s vs 0.01s, a 
12x speedup.
+  
+  To require any one or more of multiple terms to appear (OR relationship), 
use `MATCH_ANY 'keyword1 keyword2 ...'`.
+
+  ```sql
+  SELECT count() FROM hackernews_1m WHERE comment LIKE '%OLAP%' OR comment 
LIKE '%OLTP%';
+  +---------+
+  | count() |
+  +---------+
+  |      68 |
+  +---------+
+  
+  SELECT count() FROM hackernews_1m WHERE comment MATCH_ANY 'OLAP OLTP';
+  +---------+
+  | count() |
+  +---------+
+  |      71 |
+  +---------+
+  ```
+
+  ### 02 Standard Equality and Range Queries
+
+- Range query on a `DateTime` type column
+
+  ```sql
+  SELECT count() FROM hackernews_1m WHERE timestamp > '2007-08-23 04:17:00';
+  +---------+
+  | count() |
+  +---------+
+  |  999081 |
+  +---------+
+  ```
+
+- Adding an inverted index for the `timestamp` column
+
+  ```sql
+  -- For date-time types, USING INVERTED does not require specifying a parser
+  -- CREATE INDEX is one syntax for creating an index, another method will be 
shown later
+  CREATE INDEX idx_timestamp ON hackernews_1m(timestamp) USING INVERTED;
+  ```
+
+  ```sql
+  BUILD INDEX idx_timestamp ON hackernews_1m;
+  ```
+
+- Checking the index creation progress. From the difference between 
`FinishTime` and `CreateTime`, we can see that building the inverted index for 
1 million rows on the `timestamp` column took only 1 second.
+
+  ```sql
+  SHOW ALTER TABLE COLUMN;
+  
+-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
+  | JobId | TableName     | CreateTime              | FinishTime              
| IndexName     | IndexId | OriginIndexId | SchemaVersion | TransactionId | 
State    | Msg  | Progress | Timeout |
+  
+-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
+  | 10030 | hackernews_1m | 2023-02-10 19:44:12.929 | 2023-02-10 19:44:13.938 
| hackernews_1m | 10031   | 10008         | 1:1994690496  | 3             | 
FINISHED |      | NULL     | 2592000 |
+  
+-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
+  ```
+
+  ```sql
+  -- If the table has no partitions, PartitionName defaults to TableName
+  SHOW BUILD INDEX;
+  
+-------+---------------+---------------+----------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+  | JobId | TableName     | PartitionName | AlterInvertedIndexes               
                      | CreateTime              | FinishTime              | 
TransactionId | State    | Msg  | Progress |
+  
+-------+---------------+---------------+----------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+  | 10191 | hackernews_1m | hackernews_1m | [ADD INDEX idx_timestamp 
(`timestamp`) USING INVERTED],  | 2023-06-26 15:32:33.894 | 2023-06-26 
15:32:34.847 | 3             | FINISHED |      | NULL     |
+  
+-------+---------------+---------------+----------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+  ```
+
+- After the index is created, range queries use the same query syntax. Doris 
will automatically recognize the index for optimization. However, due to the 
small dataset, the performance difference is not significant.
+
+  ```sql
+  SELECT count() FROM hackernews_1m WHERE timestamp > '2007-08-23 04:17:00';
+  +---------+
+  | count() |
+  +---------+
+  |  999081 |
+  +---------+
+  ```
+
+- Performing similar operations on a numeric column `parent` with an equality 
match query.
+
+  ```sql
+  SELECT count() FROM hackernews_1m WHERE parent = 11189;
+  +---------+
+  | count() |
+  +---------+
+  |       2 |
+  +---------+
+
+  -- For numeric types, USING INVERTED does not require specifying a parser
+  -- ALTER TABLE t ADD INDEX is the second syntax for creating an index
+  ALTER TABLE hackernews_1m ADD INDEX idx_parent(parent) USING INVERTED;
+
+
+  -- Execute BUILD INDEX to create the inverted index for existing data
+  BUILD INDEX idx_parent ON hackernews_1m;
+
+
+  SHOW ALTER TABLE COLUMN;
+  
+-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
+  | JobId | TableName     | CreateTime              | FinishTime              
| IndexName     | IndexId | OriginIndexId | SchemaVersion | TransactionId | 
State    | Msg  | Progress | Timeout |
+  
+-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
+  | 10030 | hackernews_1m | 2023-02-10 19:44:12.929 | 2023-02-10 19:44:13.938 
| hackernews_1m | 10031   | 10008         | 1:1994690496  | 3             | 
FINISHED |      | NULL     | 2592000 |
+  | 10053 | hackernews_1m | 2023-02-10 19:49:32.893 | 2023-02-10 19:49:33.982 
| hackernews_1m | 10054   | 10008         | 1:378856428   | 4             | 
FINISHED |      | NULL     | 2592000 |
+  
+-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
+
+  SHOW BUILD INDEX;
+  
+-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+  | JobId | TableName     | PartitionName | AlterInvertedIndexes               
                | CreateTime              | FinishTime              | 
TransactionId | State    | Msg  | Progress |
+  
+-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+  | 11005 | hackernews_1m | hackernews_1m | [ADD INDEX idx_parent (`parent`) 
USING INVERTED],  | 2023-06-26 16:25:10.167 | 2023-06-26 16:25:10.838 | 1002    
      | FINISHED |      | NULL     |
+  
+-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+
+
+  SELECT count() FROM hackernews_1m WHERE parent = 11189;
+  +---------+
+  | count() |
+  +---------+
+  |       2 |
+  +---------+
+  ```
+
+- Creating an inverted index for the string column `author` without 
tokenization. Equality queries can also leverage the index for speedup.
+
+  ```sql
+  SELECT count() FROM hackernews_1m WHERE author = 'faster';
+  +---------+
+  | count() |
+  +---------+
+  |      20 |
+  +---------+
+
+  
+  -- Here, USING INVERTED is used without tokenizing the `author` column, 
treating it as a single term
+  ALTER TABLE hackernews_1m ADD INDEX idx_author(author) USING INVERTED;
+
+  
+  -- Execute BUILD INDEX to add the inverted index for existing data
+  BUILD INDEX idx_author ON hackernews_1m;
+
+  
+Creating an incremental index for 1 million author records took only 1.5 
seconds.
+
+```sql
+SHOW ALTER TABLE COLUMN;
++-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
+| JobId | TableName     | CreateTime              | FinishTime              | 
IndexName     | IndexId | OriginIndexId | SchemaVersion | TransactionId | State 
   | Msg  | Progress | Timeout |
++-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
+| 10030 | hackernews_1m | 2023-02-10 19:44:12.929 | 2023-02-10 19:44:13.938 | 
hackernews_1m | 10031   | 10008         | 1:1994690496  | 3             | 
FINISHED |      | NULL     | 2592000 |
+| 10053 | hackernews_1m | 2023-02-10 19:49:32.893 | 2023-02-10 19:49:33.982 | 
hackernews_1m | 10054   | 10008         | 1:378856428   | 4             | 
FINISHED |      | NULL     | 2592000 |
+| 10076 | hackernews_1m | 2023-02-10 19:54:20.046 | 2023-02-10 19:54:21.521 | 
hackernews_1m | 10077   | 10008         | 1:1335127701  | 5             | 
FINISHED |      | NULL     | 2592000 |
++-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
+```
+
+```sql
+SHOW BUILD INDEX ORDER BY CreateTime DESC LIMIT 1;
++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+| JobId | TableName     | PartitionName | AlterInvertedIndexes                 
              | CreateTime              | FinishTime              | 
TransactionId | State    | Msg  | Progress |
++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+| 13006 | hackernews_1m | hackernews_1m | [ADD INDEX idx_author (`author`) 
USING INVERTED],  | 2023-06-26 17:23:02.610 | 2023-06-26 17:23:03.755 | 3004    
      | FINISHED |      | NULL     |
++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+```
+
+-- After creating the index, string equality matches also showed significant 
acceleration.
+
+```sql
+SELECT count() FROM hackernews_1m WHERE author = 'faster';
++---------+
+| count() |
++---------+
+|      20 |
++---------+
+```
diff --git a/docs/table-design/index/inverted-index/scoring.md 
b/docs/table-design/index/inverted-index/scoring.md
new file mode 100644
index 00000000000..fd25c4b1637
--- /dev/null
+++ b/docs/table-design/index/inverted-index/scoring.md
@@ -0,0 +1,107 @@
+---
+
+{
+"title": "Text Search Scoring",
+"language": "en"
+}
+---
+
+## Overview
+
+Text search scoring measures how relevant each row in a table is to a given 
query text.
+When executing a query that includes full-text search predicates (such as 
`MATCH_ANY` or `MATCH_ALL`), Doris computes a numeric score for each row, 
representing its degree of match with the query.
+This score can be used for ranking query results, so that rows most relevant 
to the query appear first.
+
+Doris currently uses the **BM25 (Best Matching 25)** algorithm for text 
relevance scoring.
+
+## BM25 Algorithm
+
+BM25 is a probabilistic relevance algorithm that evaluates how well a record 
matches the query terms by considering term frequency, inverse document 
frequency, and record length.
+Compared with the traditional TF-IDF model, BM25 provides greater robustness 
and tunability, effectively balancing score differences between long and short 
text.
+
+### Formula
+
+The core BM25 scoring formula is:
+
+```
+score = IDF × (tf × (k1 + 1)) / (tf + k1 × (1 - b + b × |d| / avgdl))
+```
+
+Where:
+
+* **tf** – the frequency of a query term in the current row
+* **IDF** – inverse document frequency, indicating how rare the term is across 
all rows
+* **|d|** – the length of the current row (number of tokens after analysis)
+* **avgdl** – the average row length in the table
+* **k1**, **b** – algorithm tuning parameters
+
+**Default parameters:**
+
+| Parameter | Default | Description                                            
 |
+| --------- | ------- | 
------------------------------------------------------- |
+| `k1`      | 1.2     | Controls how strongly term frequency affects the 
score. |
+| `b`       | 0.75    | Controls the strength of length normalization.         
 |
+| `boost`   | 1.0     | Optional query-level weighting factor.                 
 |
+
+**Supporting statistics:**
+
+```
+IDF = log(1 + (N - n + 0.5) / (n + 0.5))
+avgdl = total_terms / total_rows
+```
+
+Where:
+
+* `N` – total number of rows in the table
+* `n` – number of rows that contain the query term
+
+The final score of a row is the sum of the BM25 scores for all query terms.
+
+
+## Using Scoring in Doris
+
+### Supported Index Types
+
+* **Tokenized inverted index** – supports BM25 scoring.
+* **Non-tokenized inverted index** – supports only exact matching; scoring is 
not calculated.
+
+### Supported Query Types
+
+* `MATCH_ANY`
+* `MATCH_ALL`
+* `MATCH_PHRASE`
+* `MATCH_PHRASE_PREFIX`
+* `SEARCH`
+
+### Query Pushdown Rules
+
+To enable scoring pushdown into the inverted index engine, the following 
conditions must be met:
+
+1. The `SELECT` clause includes the `score()` function.
+2. The `WHERE` clause contains at least one `MATCH_*` predicate.
+3. The query is a Top-N query with an `ORDER BY` clause based on the score 
result.
+
+---
+
+## Example
+
+```sql
+SELECT *,
+       score() AS relevance
+FROM search_demo
+WHERE content MATCH_ANY 'text search test'
+ORDER BY relevance DESC
+LIMIT 10;
+```
+
+This query returns the top 10 rows most relevant to the search terms, ranked 
by BM25 score.
+
+---
+
+## Result Interpretation
+
+* **Score range** – BM25 scores are positive and unbounded. Only relative 
magnitude matters.
+* **Multiple terms** – For multi-term queries, the total score is the sum of 
all term scores.
+* **Length effect** – Shorter rows generally receive higher scores when 
containing the same terms.
+* **No matching terms** – If none of the query terms appear in the table, the 
score is `0`.
+
diff --git a/docs/table-design/index/inverted-index/search-function.md 
b/docs/table-design/index/inverted-index/search-function.md
new file mode 100644
index 00000000000..f9952a093a0
--- /dev/null
+++ b/docs/table-design/index/inverted-index/search-function.md
@@ -0,0 +1,220 @@
+---
+{
+    "title": "SEARCH Function",
+    "language": "en"
+}
+---
+
+## Introduction
+
+The `SEARCH` function provides a concise DSL (domain‑specific language) for 
full‑text queries in Apache Doris from 4.0 version. It simplifies common text 
search patterns into a unified query entry that runs on inverted indexes for 
high performance.
+
+
+SEARCH is a boolean predicate function evaluated in the WHERE clause. It takes 
a SEARCH DSL string that describes text‑matching rules and pushes matchable 
predicates to inverted indexes.
+
+
+## Syntax and Semantics
+
+Syntax
+
+```sql
+SEARCH('<search_expression>')
+```
+
+- Argument: `<search_expression>` — string literal containing the SEARCH DSL 
expression
+
+Usage
+
+- Placement: use in the `WHERE` clause as a predicate
+- Return type: BOOLEAN (TRUE for matching rows)
+
+### Current Supported Queries
+
+#### Term query
+- Syntax: `column:term`
+- Semantics: match the term in the column's token stream; case sensitivity 
depends on index `lower_case`
+- Indexing tip: add an inverted index with an appropriate `parser`/analyzer on 
the column
+```sql
+SELECT id, title FROM search_test_basic WHERE SEARCH('title:Machine');
+SELECT id, title FROM search_test_basic WHERE SEARCH('title:Python');
+SELECT id, title FROM search_test_basic WHERE SEARCH('category:Technology');
+```
+
+#### ANY
+- Syntax: `column:ANY(term1 term2 ...)`
+- Semantics: matches if any listed term is present in the column (OR); 
order-insensitive; duplicates ignored
+- Indexing tip: use a tokenized inverted index (e.g., 
`english`/`chinese`/`unicode` parser)
+```sql
+SELECT id, title FROM search_test_basic WHERE SEARCH('tags:ANY(python 
javascript)');
+SELECT id, title FROM search_test_basic WHERE SEARCH('tags:ANY(machine 
learning tutorial)');
+
+-- Edge case: single value behaves like a term query
+SELECT id, title FROM search_test_basic WHERE SEARCH('tags:ANY(python)');
+```
+
+#### ALL
+- Syntax: `column:ALL(term1 term2 ...)`
+- Semantics: requires all listed terms be present (AND); order-insensitive; 
duplicates ignored
+- Indexing tip: use a tokenized inverted index (e.g., 
`english`/`chinese`/`unicode` parser)
+```sql
+SELECT id, title FROM search_test_basic WHERE SEARCH('tags:ALL(machine 
learning)');
+SELECT id, title FROM search_test_basic WHERE SEARCH('tags:ALL(programming 
tutorial)');
+
+-- Edge case: single value behaves like a term query
+SELECT id, title FROM search_test_basic WHERE SEARCH('tags:ALL(python)');
+```
+
+#### Boolean operators
+- Syntax: `(expr) AND/OR/NOT (expr)`
+- Semantics: combine sub-expressions inside SEARCH using boolean operators
+- Indexing tip: keep matchable conditions inside SEARCH for pushdown; other 
WHERE predicates act as filters
+```sql
+SELECT id, title FROM search_test_basic
+WHERE SEARCH('title:Machine AND category:Technology');
+
+SELECT id, title FROM search_test_basic
+WHERE SEARCH('title:Python OR title:Data');
+
+SELECT id, title FROM search_test_basic
+WHERE SEARCH('category:Technology AND NOT title:Machine');
+```
+
+#### Grouping and nesting
+- Syntax: parenthesized sub-expressions
+- Semantics: control precedence with parentheses; multi-level nesting is 
supported
+- Indexing tip: same as above
+```sql
+SELECT id, title FROM search_test_basic
+WHERE SEARCH('(title:Machine OR title:Python) AND category:Technology');
+
+SELECT id, title FROM search_test_basic
+WHERE SEARCH('tags:ANY(python javascript) AND (category:Technology OR 
category:Programming)');
+```
+
+#### Multi‑column search
+- Syntax: `column1:term OR column2:ANY(...) OR ...`
+- Semantics: search across multiple columns; each column follows its own 
index/analyzer configuration
+- Indexing tip: add inverted indexes for each involved column
+```sql
+SELECT id, title FROM search_test_basic
+WHERE SEARCH('title:Python OR tags:ANY(database mysql) OR author:Alice');
+
+SELECT id, title FROM search_test_basic
+WHERE SEARCH('tags:ALL(tutorial) AND category:Technology');
+```
+
+#### EXACT query
+
+- Pattern: `column:EXACT(<text>)`
+- Semantics: exact match on the entire column value; case‑sensitive; does not 
match partial tokens
+- Indexing tip: use an untokenized inverted index on the column (no `parser`) 
for best performance
+
+Example:
+
+```sql
+SELECT id
+FROM t
+WHERE SEARCH('content:EXACT(machine learning)');
+```
+
+#### Variant subcolumn query
+
+- Pattern: `variant_col.sub.path:term`
+- Semantics: query a VARIANT subcolumn using dot notation; matching follows 
the index/analyzer configured on the VARIANT column
+- Supports boolean combinations, `ANY`/`ALL`, nested paths; nonexistent 
subcolumns simply produce no matches
+
+Example:
+
+```sql
+SELECT id
+FROM test_variant_search_subcolumn
+WHERE SEARCH('properties.message:alpha');
+```
+
+### Examples
+
+```sql
+-- Table with both tokenized and untokenized indexes
+CREATE TABLE t (
+  id INT,
+  content STRING,
+  INDEX idx_untokenized(content) USING INVERTED,
+  INDEX idx_tokenized(content)  USING INVERTED PROPERTIES("parser" = 
"standard")
+);
+
+-- Exact string match (uses untokenized index)
+SELECT id, content
+FROM t
+WHERE SEARCH('content:EXACT(machine learning)')
+ORDER BY id;
+
+-- No match for partial token with EXACT
+SELECT id, content
+FROM t
+WHERE SEARCH('content:EXACT(machine)')
+ORDER BY id;
+
+-- ANY/ALL use tokenized index
+SELECT id, content FROM t WHERE SEARCH('content:ANY(machine learning)') ORDER 
BY id;
+SELECT id, content FROM t WHERE SEARCH('content:ALL(machine learning)') ORDER 
BY id;
+
+-- Compare EXACT vs ANY
+SELECT id, content FROM t WHERE SEARCH('content:EXACT(deep learning)') ORDER 
BY id;
+SELECT id, content FROM t WHERE SEARCH('content:ANY(deep learning)') ORDER BY 
id;
+
+-- Mixed conditions
+SELECT id, content
+FROM t
+WHERE SEARCH('content:EXACT(machine learning) OR content:ANY(intelligence)')
+ORDER BY id;
+
+-- VARIANT column with inverted index
+CREATE TABLE test_variant_search_subcolumn (
+  id BIGINT,
+  properties VARIANT<PROPERTIES("variant_max_subcolumns_count"="0")>,
+  INDEX idx_properties (properties) USING INVERTED PROPERTIES (
+    "parser" = "unicode",
+    "lower_case" = "true",
+    "support_phrase" = "true"
+  )
+);
+
+-- Single term
+SELECT id
+FROM test_variant_search_subcolumn
+WHERE SEARCH('properties.message:alpha')
+ORDER BY id;
+
+-- AND / ALL
+SELECT id
+FROM test_variant_search_subcolumn
+WHERE SEARCH('properties.message:alpha AND properties.message:beta')
+ORDER BY id;
+
+SELECT id
+FROM test_variant_search_subcolumn
+WHERE SEARCH('properties.message:ALL(alpha beta)')
+ORDER BY id;
+
+-- OR across different subcolumns
+SELECT id
+FROM test_variant_search_subcolumn
+WHERE SEARCH('properties.message:hello OR properties.category:beta')
+ORDER BY id;
+```
+
+### Current Limitations
+
+- Phrase, prefix, wildcard, and regular expression queries are not yet 
supported in `SEARCH()`
+- Range/list queries are not yet supported in `SEARCH()`
+- Unsupported patterns may fall back to term queries
+
+Use standard operators or text search operators as alternatives when needed, 
for example:
+
+```sql
+-- Phrase search via operator
+SELECT * FROM t WHERE content MATCH_PHRASE 'full text search';
+
+-- Range filters via SQL
+SELECT * FROM t WHERE created_at >= '2024-01-01';
+```
diff --git a/docs/table-design/index/inverted-index/search-operators.md 
b/docs/table-design/index/inverted-index/search-operators.md
new file mode 100644
index 00000000000..be1142b6406
--- /dev/null
+++ b/docs/table-design/index/inverted-index/search-operators.md
@@ -0,0 +1,79 @@
+---
+{
+    "title": "Full-Text Search and Query Acceleration Support",
+    "language": "en"
+}
+---
+
+## Full-Text Search Operators
+
+### MATCH_ANY
+- Matches rows containing any of the specified keywords in a field.
+```sql
+SELECT * FROM table_name WHERE content MATCH_ANY 'keyword1 keyword2';
+```
+
+### MATCH_ALL
+- Matches rows containing all specified keywords in a field.
+```sql
+SELECT * FROM table_name WHERE content MATCH_ALL 'keyword1 keyword2';
+```
+
+### MATCH_PHRASE
+- Phrase match where terms appear adjacent and in order.
+- Requires index property `"support_phrase" = "true"` for acceleration.
+```sql
+SELECT * FROM table_name WHERE content MATCH_PHRASE 'keyword1 keyword2';
+```
+
+### MATCH_PHRASE with slop
+- Loose phrase matching that allows gaps between terms up to a maximum 
distance.
+```sql
+-- Allow up to 3 terms between keyword1 and keyword2
+SELECT * FROM table_name WHERE content MATCH_PHRASE 'keyword1 keyword2 ~3';
+```
+
+### MATCH_PHRASE with strict order
+- Combine slop with strict order using `+` to enforce term order.
+```sql
+SELECT * FROM table_name WHERE content MATCH_PHRASE 'keyword1 keyword2 ~3+';
+```
+
+### MATCH_PHRASE_PREFIX
+- Phrase match where the last term uses prefix matching.
+- With a single term, it degrades to prefix matching for that term.
+```sql
+-- Last term as prefix
+SELECT * FROM table_name WHERE content MATCH_PHRASE_PREFIX 'keyword1 key';
+
+-- Single-term prefix match
+SELECT * FROM table_name WHERE content MATCH_PHRASE_PREFIX 'keyword1';
+```
+
+### MATCH_REGEXP
+- Regular expression match on the tokenized terms of a field.
+```sql
+SELECT * FROM table_name WHERE content MATCH_REGEXP '^key_word.*';
+```
+
+### MATCH_PHRASE_EDGE
+- Treats the first term as suffix-match, middle terms as exact, last term as 
prefix-match; terms must be adjacent.
+```sql
+SELECT * FROM table_name WHERE content MATCH_PHRASE_EDGE 'search engine optim';
+```
+
+## Inverted Index Query Acceleration
+
+### Supported Operators and Functions
+
+- Equality and set: `=`, `!=`, `IN`, `NOT IN`
+- Range: `>`, `>=`, `<`, `<=`, `BETWEEN`
+- Null checks: `IS NULL`, `IS NOT NULL`
+- Arrays: `array_contains`, `array_overlaps`
+
+```sql
+-- Examples
+SELECT * FROM t WHERE price >= 100 AND price < 200;          -- range
+SELECT * FROM t WHERE tags IN ('a','b','c');                  -- set
+SELECT * FROM t WHERE array_contains(attributes, 'color');    -- arrays
+```
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current.json 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current.json
index 9a3168b67d8..2e76d3c6a4c 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current.json
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current.json
@@ -91,6 +91,10 @@
     "message": "表索引",
     "description": "The label for category Table Index in sidebar docs"
   },
+  "sidebar.docs.category.Inverted Index": {
+    "message": "倒排索引",
+    "description": "The label for category Inverted Index in sidebar docs"
+  },
   "sidebar.docs.category.Data Manipulation": {
     "message": "数据操作",
     "description": "The label for category Data Manipulation in sidebar docs"
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/search/text-search.md 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/search/text-search.md
new file mode 100644
index 00000000000..65e119027e2
--- /dev/null
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/search/text-search.md
@@ -0,0 +1,236 @@
+---
+{
+    "title": "文本搜索",
+    "language": "zh-CN"
+}
+---
+
+## 概述
+
+文本搜索用于在数据集中检索包含特定词项或短语的文档，并根据相关性对结果进行排序。
+
+相比向量搜索擅长“找全”——利用语义相似性扩展召回范围，文本搜索更擅长“找准”——提供可控、可解释的精确匹配，确保关键词命中与过滤条件的确定性。
+
+在生成式 AI 
应用中，尤其是检索增强生成（RAG）场景下，文本搜索与向量搜索相辅相成，两者协同，兼顾语义广度与词法精度，既提升召回率，又保证结果的准确性与可解释性，共同构建可靠的检索基础，为大模型提供更准确、更相关的上下文。
+
+## Doris 文本搜索的演进
+
+从 2.0.0 版本开始，Doris 引入了 倒排索引（Inverted 
Index），以支持高性能的全文搜索。随着检索场景的多样化与查询复杂度的提升，Doris 在后续版本中持续扩展文本搜索能力，使其能够在更广泛的场景中发挥作用。
+
+### 基础阶段（2.0+）
+引入列级倒排索引，提供基础全文检索算子（MATCH_ANY、MATCH_ALL）和多语言分词器，支持在大规模数据集中进行高效的关键词检索。
+
+### 功能扩展（2.x → 3.x）
+完善算子体系，新增短语匹配（MATCH_PHRASE）、前缀搜索（MATCH_PHRASE_PREFIX）、正则匹配（MATCH_REGEXP）等高级文本搜索算子，并在
 3.1 版本引入自定义分词能力，进一步满足不同应用场景下的文本搜索需求。
+
+### 能力增强（4.0+）
+新增文本搜索相关性打分能力与统一的搜索入口，正式引入 BM25 打分算法与 SEARCH 函数。
+
+- BM25 相关性打分：通过 score() 函数根据文本相关性对结果进行排序，可与向量相似度分数结合，实现混合排序。
+
+- SEARCH 函数：提供统一的查询 DSL，支持跨列查询与布尔逻辑组合，简化复杂查询构建，同时进一步提升查询性能。
+
+---
+
+## Doris 核心文本搜索特性
+
+### 丰富的文本算子
+
+Doris 提供了一套覆盖多种检索模式的全文搜索算子，可满足从基础关键词匹配到复杂短语查询的不同需求。
+
+当前版本支持的主要算子包括：
+
+* **`MATCH_ANY` / `MATCH_ALL`**：支持任意词匹配（OR）与全词匹配（AND），适用于通用关键词检索。
+* **`MATCH_PHRASE`**：精确短语匹配，支持自定义词距（slop）与顺序控制，常用于邻近词语查询。
+* **`MATCH_PHRASE_PREFIX`**：短语前缀匹配，用于自动补全和增量搜索。
+* **`MATCH_REGEXP`**：基于正则表达式的匹配，适合模式化文本检索。
+
+这些算子可独立使用，也可通过 `SEARCH()` 函数组合构建复杂逻辑查询。
+例如：
+
+```sql
+-- 精确短语搜索
+SELECT * FROM docs WHERE content MATCH_PHRASE '倒排 索引';
+
+-- 前缀搜索
+SELECT * FROM docs WHERE content MATCH_PHRASE_PREFIX '数据 仓';
+```
+
+[查看所有算子 →](../../table-design/index/inverted-index/search-operators.md)
+
+---
+
+### 自定义分词（3.1+）
+
+在文本搜索中，分词方式直接决定了检索精度与召回效果。
+从 3.1 版本起，Doris 支持 **自定义分词器（Custom 
Analyzer）**，允许用户根据业务需求灵活定义分词流程，通过组合字符过滤器（char_filter）、分词器（tokenizer）和词元过滤器（token_filter）实现更细粒度的文本控制。
+
+典型使用方式包括：
+
+* **自定义字符过滤**：在分词前进行符号替换、去除或标准化。
+* **选择分词算法**：支持 `standard`、`ngram`、`edge_ngram`、`keyword`、`icu` 
等多种类型，用于处理不同语言和结构的文本。
+* **应用词元过滤**：如 `lowercase`、`word_delimiter`、`ascii_folding` 等，用于规范化和精炼分词结果。
+
+```sql
+-- 示例：定义自定义分词器
+CREATE INVERTED INDEX ANALYZER IF NOT EXISTS keyword_lowercase
+PROPERTIES (
+  "tokenizer" = "keyword",
+  "token_filter" = "asciifolding, lowercase"
+);
+
+-- 在建表时使用自定义分词器
+CREATE TABLE docs (
+    id BIGINT,
+    content TEXT,
+    INDEX idx_content (content) USING INVERTED PROPERTIES(
+        "analyzer" = "keyword_lowercase",
+        "support_phrase" = "true"
+    )
+);
+```
+
+[了解自定义分词 →](../../table-design/index/inverted-index/custom-analyzer.md)
+
+---
+
+### BM25 相关性打分（4.0+）
+
+Doris 实现了 **BM25（Best Matching 25）** 算法用于文本相关性计算，为全文搜索提供排序与打分能力。
+
+* 基于词频（TF）、逆文档频率（IDF）和文档长度的概率模型
+* 对长短文本均具良好鲁棒性
+* 可通过参数 `k1`、`b` 调整加权策略
+
+```sql
+SELECT id, title, score() AS relevance
+FROM docs
+WHERE content MATCH_ANY '实时 OLAP 分析'
+ORDER BY relevance DESC
+LIMIT 10;
+```
+
+[了解更多打分机制 →](../../table-design/index/inverted-index/scoring.md)
+
+---
+
+### SEARCH 函数：统一查询入口（4.0+）
+
+`SEARCH()` 函数提供统一的文本检索语法入口，支持多列搜索与布尔逻辑组合，使复杂查询表达更简洁：
+
+```sql
+SELECT id, title, score() AS relevance
+FROM docs
+WHERE SEARCH('title:Machine AND tags:ANY(database sql)')
+ORDER BY relevance DESC
+LIMIT 20;
+```
+
+[完整 SEARCH 函数指南 →](../../table-design/index/inverted-index/search-function.md)
+
+## 快速开始
+
+### 步骤 1:创建带倒排索引的表
+
+```sql
+CREATE TABLE docs (
+  id BIGINT,
+  title STRING,
+  content STRING,
+  category STRING,
+  tags ARRAY<STRING>,
+  created_at DATETIME,
+  -- 文本搜索索引
+  INDEX idx_title(title) USING INVERTED PROPERTIES ("parser" = "chinese"),
+  INDEX idx_content(content) USING INVERTED PROPERTIES ("parser" = "chinese", 
"support_phrase" = "true"),
+  INDEX idx_category(category) USING INVERTED,
+  INDEX idx_tags(tags) USING INVERTED
+)
+DUPLICATE KEY(id)
+DISTRIBUTED BY HASH(id) BUCKETS 10;
+```
+
+### 步骤 2:运行文本查询
+
+```sql
+-- 简单关键词搜索
+SELECT * FROM docs WHERE content MATCH_ANY 'apache doris';
+
+-- 短语搜索
+SELECT * FROM docs WHERE content MATCH_PHRASE '全文检索';
+
+-- 使用 SEARCH 进行布尔查询
+SELECT * FROM docs
+WHERE SEARCH('title:apache AND (category:数据库 OR tags:ANY(sql nosql))');
+
+-- 基于相关性的排序
+SELECT id, title, score() AS relevance
+FROM docs
+WHERE content MATCH_ANY '实时 分析 OLAP'
+ORDER BY relevance DESC
+LIMIT 10;
+```
+
+## 混合搜索:文本 + 向量
+
+在 RAG 应用中结合文本搜索和向量相似度实现全面检索:
+
+```sql
+-- 混合检索:语义相似度 + 关键词过滤
+SELECT id, title, score() AS text_relevance
+FROM docs
+WHERE
+  -- 向量过滤实现语义相似度
+  cosine_distance(embedding, [0.1, 0.2, ...]) < 0.3
+  -- 文本过滤实现关键词约束
+  AND SEARCH('title:搜索 AND content:引擎 AND category:技术')
+ORDER BY text_relevance DESC
+LIMIT 10;
+```
+
+## 管理倒排索引
+
+### 创建索引
+
+```sql
+-- 在建表时创建
+CREATE TABLE t (
+  content STRING,
+  INDEX idx(content) USING INVERTED PROPERTIES ("parser" = "chinese")
+);
+
+-- 在现有表上创建
+CREATE INDEX idx_content ON docs(content) USING INVERTED PROPERTIES ("parser" 
= "chinese");
+
+-- 为现有数据构建索引
+BUILD INDEX idx_content ON docs;
+```
+
+### 删除索引
+
+```sql
+DROP INDEX idx_content ON docs;
+```
+
+### 查看索引
+
+```sql
+SHOW CREATE TABLE docs;
+SHOW INDEX FROM docs;
+```
+
+[索引管理指南 →](../../table-design/index/inverted-index/overview.md)
+
+## 延伸阅读
+
+### 核心文档
+
+- [倒排索引概述](../../table-design/index/inverted-index/overview.md) — 架构、索引原理和管理
+- [文本搜索算子](../../table-design/index/inverted-index/search-operators.md) — 
完整算子参考和查询加速
+- [SEARCH 函数](../../table-design/index/inverted-index/search-function.md) — 
统一查询 DSL 语法和示例
+- [BM25 打分](../../table-design/index/inverted-index/scoring.md) — 相关性排序算法和用法
+
+### 高级主题
+
+- [自定义分析器](../../table-design/index/inverted-index/custom-analyzer.md) — 
构建特定领域的分词器和过滤器
+- [向量搜索](./vector-search.md) — 使用嵌入向量进行语义相似度搜索
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/vector-search.md 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/search/vector-search.md
similarity index 100%
rename from 
i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/vector-search.md
rename to 
i18n/zh-CN/docusaurus-plugin-content-docs/current/ai/search/vector-search.md
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/inverted-index/custom-analyzer.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/inverted-index/custom-analyzer.md
new file mode 100644
index 00000000000..da54a2ebd5a
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/inverted-index/custom-analyzer.md
@@ -0,0 +1,281 @@
+---
+{
+    "title": "自定义分词",
+    "language": "zh-CN"
+}
+---
+
+## 概述
+
+自定义分词可以突破内置分词的局限，根据特定需求组合字符过滤器、分词器和词元过滤器，精细定义文本如何被切分成可搜索的词项，这直接决定了搜索结果的相关性与数据分析的准确性，是提升搜索体验与数据价值的底层关键。
+
+![自定义分词示意图](/images/analyzer.png)
+
+## 使用自定义分词
+
+### 创建
+
+#### 1. char_filter（字符过滤器）
+
+```sql
+CREATE INVERTED INDEX CHAR_FILTER IF NOT EXISTS x_char_filter
+PROPERTIES (
+  "type" = "char_replace"
+  -- 其他参数见下文
+);
+```
+
+`char_replace`：在分词前将指定字符替换为目标字符。
+- 参数
+  - `char_filter_pattern`：需要替换的字符列表
+  - `char_filter_replacement`：替换后的字符（默认空格）
+
+#### 2. tokenizer（分词器）
+
+```sql
+CREATE INVERTED INDEX TOKENIZER IF NOT EXISTS x_tokenizer
+PROPERTIES (
+  "type" = "standard"
+);
+```
+
+- `standard`：标准分词（遵循 Unicode 文本分割），适用于多数语言
+- `ngram`：按 N 元组切分
+  - `min_ngram`：最小长度（默认 1）
+  - `max_ngram`：最大长度（默认 2）
+  - 
`token_chars`：保留字符类别（默认保留全部）。可选：`letter`、`digit`、`whitespace`、`punctuation`、`symbol`
+- `edge_ngram`：从词首起始位置生成 N 元组
+  - `min_ngram`：最小长度（默认 1）
+  - `max_ngram`：最大长度（默认 2）
+  - `token_chars`：同上
+- `keyword`：整段文本作为一个词项输出，常与 token_filter 组合使用
+- `char_group`：按给定字符切分
+  - `tokenize_on_chars`：字符列表或类别，类别支持 
`whitespace`、`letter`、`digit`、`punctuation`、`symbol`、`cjk`
+- `basic`：简单英文/数字/中文/Unicode 分词
+  - `extra_chars`：额外分割的 ASCII 字符（如 `[]().`）
+- `icu`：ICU 国际化分词，支持多语言复杂脚本
+
+#### 3. token_filter（词元过滤器）
+
+```sql
+CREATE INVERTED INDEX TOKEN_FILTER IF NOT EXISTS x_token_filter
+PROPERTIES (
+  "type" = "word_delimiter"
+);
+```
+
+- `word_delimiter`：在非字母数字字符处切分，并可执行标准化
+  - 默认规则：
+    - 使用非字母数字字符作为分隔符（例：Super-Duper → Super, Duper）
+    - 清除 token 首尾分隔符（例：XL---42+'Autocoder' → XL, 42, Autocoder）
+    - 在大小写转换处切分（例：PowerShot → Power, Shot）
+    - 在字母与数字交界处切分（例：XL500 → XL, 500）
+    - 移除英文所有格 's（例：Neil's → Neil）
+  - 可选参数：
+    - `generate_number_parts`（默认 true）
+    - `generate_word_parts`（默认 true）
+    - `protected_words`
+    - `split_on_case_change`（默认 true）
+    - `split_on_numerics`（默认 true）
+    - `stem_english_possessive`（默认 true）
+    - `type_table`：自定义字符类型映射（如 `[+ => ALPHA, - => ALPHA]`），类型含 
`ALPHA`、`ALPHANUM`、`DIGIT`、`LOWER`、`SUBWORD_DELIM`、`UPPER`
+- `ascii_folding`：将非 ASCII 字符映射为等效 ASCII
+- `lowercase`：将 token 文本转为小写
+
+#### 4. analyzer（分析器）
+
+```sql
+CREATE INVERTED INDEX ANALYZER IF NOT EXISTS x_analyzer
+PROPERTIES (
+  "tokenizer" = "x_tokenizer",            -- 单个分词器
+  "token_filter" = "x_filter1, x_filter2" -- 一个或多个 token_filter，按顺序执行
+);
+```
+
+### 查看
+
+```sql
+SHOW INVERTED INDEX TOKENIZER;
+SHOW INVERTED INDEX TOKEN_FILTER;
+SHOW INVERTED INDEX ANALYZER;
+```
+
+### 删除
+
+```sql
+DROP INVERTED INDEX TOKENIZER IF EXISTS x_tokenizer;
+DROP INVERTED INDEX TOKEN_FILTER IF EXISTS x_token_filter;
+DROP INVERTED INDEX ANALYZER IF EXISTS x_analyzer;
+```
+
+## 建表中使用自定义分词
+
+1. 自定义分词在索引properties中使用analyzer来设置自定义分词器
+2. properties中analyzer可以配合使用的只有support_phrase
+
+```sql
+CREATE TABLE tbl (
+    `a` bigint NOT NULL AUTO_INCREMENT(1),
+    `ch` text NULL,
+    INDEX idx_ch (`ch`) USING INVERTED PROPERTIES("analyzer" = 
"x_custom_analyzer", "support_phrase" = "true")
+)
+table_properties;
+```
+
+## 使用限制
+
+1. tokenizer和token_filter中type和参数只能填写目前支持的分词器和词元过滤器，否则建表失败
+2. 只有在没有任何表使用analyzer的时候才能删除它
+3. 只有在没有任何analyzer使用tokenizer和token_filter的情况下才能删除它
+4. 使用自定义分词语法10s后会被同步到be，之后导入正常不会报错
+
+## 注意事项
+
+1. 自定义分词analyzer嵌套多个可能会导致分词性能降低
+2. select tokenize 分词函数支持自定义分词
+3. 预定义分词built_in_analyzer，自定义分词使用anlyzer，只能存在一个
+
+## 完整示例
+
+### 示例1
+
+使用edge_ngram对电话号码进行分词
+
+```sql
+CREATE INVERTED INDEX TOKENIZER IF NOT EXISTS edge_ngram_phone_number_tokenizer
+PROPERTIES
+(
+    "type" = "edge_ngram",
+    "min_gram" = "3",
+    "max_gram" = "10",
+    "token_chars" = "digit"
+);
+
+CREATE INVERTED INDEX ANALYZER IF NOT EXISTS edge_ngram_phone_number
+PROPERTIES
+(
+    "tokenizer" = "edge_ngram_phone_number_tokenizer"
+);
+
+CREATE TABLE tbl (
+    `a` bigint NOT NULL AUTO_INCREMENT(1),
+    `ch` text NULL,
+    INDEX idx_ch (`ch`) USING INVERTED PROPERTIES("support_phrase" = "true", 
"analyzer" = "edge_ngram_phone_number")
+) ENGINE=OLAP
+DUPLICATE KEY(`a`)
+DISTRIBUTED BY RANDOM BUCKETS 1
+PROPERTIES (
+"replication_allocation" = "tag.location.default: 1"
+);
+
+select tokenize('13891972631', '"analyzer"="edge_ngram_phone_number"');
+```
+
+返回结果：
+```json
+[
+  {"token":"138"},
+  {"token":"1389"},
+  {"token":"13891"},
+  {"token":"138919"},
+  {"token":"1389197"},
+  {"token":"13891972"},
+  {"token":"138919726"},
+  {"token":"1389197263"}
+]
+```
+
+### 示例2
+
+使用standard + word_delimiter进行配合精细分词
+
+```sql
+CREATE INVERTED INDEX TOKEN_FILTER IF NOT EXISTS word_splitter
+PROPERTIES
+(
+    "type" = "word_delimiter",
+    "split_on_numerics" = "false",
+    "split_on_case_change" = "false"
+);
+
+CREATE INVERTED INDEX ANALYZER IF NOT EXISTS lowercase_delimited
+PROPERTIES
+(
+    "tokenizer" = "standard",
+    "token_filter" = "asciifolding, word_splitter, lowercase"
+);
+
+CREATE TABLE tbl (
+    `a` bigint NOT NULL AUTO_INCREMENT(1),
+    `ch` text NULL,
+    INDEX idx_ch (`ch`) USING INVERTED PROPERTIES("support_phrase" = "true", 
"analyzer" = "lowercase_delimited")
+) ENGINE=OLAP
+DUPLICATE KEY(`a`)
+DISTRIBUTED BY RANDOM BUCKETS 1
+PROPERTIES (
+"replication_allocation" = "tag.location.default: 1"
+);
+
+select tokenize('The server at IP 192.168.1.15 sent a confirmation to 
[email protected], requiring a quickResponse before the deadline.', 
'"analyzer"="lowercase_delimited"');
+```
+
+返回结果：
+```json
+[
+  {"token":"the"},
+  {"token":"server"},
+  {"token":"at"},
+  {"token":"ip"},
+  {"token":"192"},
+  {"token":"168"},
+  {"token":"1"},
+  {"token":"15"},
+  {"token":"sent"},
+  {"token":"a"},
+  {"token":"confirmation"},
+  {"token":"to"},
+  {"token":"user"},
+  {"token":"123"},
+  {"token":"example"},
+  {"token":"com"},
+  {"token":"requiring"},
+  {"token":"a"},
+  {"token":"quickresponse"},
+  {"token":"before"},
+  {"token":"the"},
+  {"token":"deadline"}
+]
+```
+
+### 示例3
+
+使用keyword保留原词利用多个token_filter进行分词
+
+```sql
+CREATE INVERTED INDEX ANALYZER IF NOT EXISTS keyword_lowercase
+PROPERTIES
+(
+"tokenizer" = "keyword",
+"token_filter" = "asciifolding, lowercase"
+);
+
+CREATE TABLE tbl (
+    `a` bigint NOT NULL AUTO_INCREMENT(1),
+    `ch` text NULL,
+    INDEX idx_ch (`ch`) USING INVERTED PROPERTIES("support_phrase" = "true", 
"analyzer" = "keyword_lowercase")
+) ENGINE=OLAP
+DUPLICATE KEY(`a`)
+DISTRIBUTED BY RANDOM BUCKETS 1
+PROPERTIES (
+"replication_allocation" = "tag.location.default: 1"
+);
+
+select tokenize('hÉllo World', '"analyzer"="keyword_lowercase"');
+```
+
+返回结果：
+```json
+[
+  {"token":"hello world"}
+]
+```
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/inverted-index/overview.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/inverted-index/overview.md
new file mode 100644
index 00000000000..8a3df5d509c
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/inverted-index/overview.md
@@ -0,0 +1,808 @@
+---
+{
+    "title": "倒排索引",
+    "sidebar_label": "概述",
+    "language": "zh-CN"
+}
+---
+
+## 索引原理
+
+[倒排索引](https://zh.wikipedia.org/wiki/%E5%80%92%E6%8E%92%E7%B4%A2%E5%BC%95)，是信息检索领域常用的索引技术，将文本分成一个个词，构建
 词 -> 文档编号 的索引，可以快速查找一个词在哪些文档出现。
+
+从 2.0.0 版本开始，Doris 支持倒排索引，可以用来进行文本类型的全文检索、普通数值日期类型的等值范围查询，快速从海量数据中过滤出满足条件的行。
+
+在 Doris 的倒排索引实现中，Table 的一行对应一个文档、一列对应文档中的一个字段，因此利用倒排索引可以根据关键词快速定位包含它的行，达到 
WHERE 子句加速的目的。
+
+与 Doris 
中其他索引不同的是，在存储层倒排索引使用独立的文件，跟数据文件一一对应、但物理存储上文件相互独立。这样的好处是可以做到创建、删除索引不用重写数据文件，大幅降低处理开销。
+
+
+## 使用场景
+
+倒排索引的使用范围很广泛，可以加速等值、范围、全文检索（关键词匹配、短语系列匹配等）。一个表可以有多个倒排索引，查询时多个倒排索引的条件可以任意组合。
+
+倒排索引的功能简要介绍如下：
+
+**1. 加速字符串类型的全文检索**
+
+- 支持关键词检索，包括同时匹配多个关键字 `MATCH_ALL`、匹配任意一个关键字 `MATCH_ANY`
+
+- 支持短语查询 `MATCH_PHRASE`
+  - 支持指定词距 `slop`
+  - 支持短语 + 前缀 `MATCH_PHRASE_PREFIX`
+
+- 支持分词正则查询 `MATCH_REGEXP`
+
+- 支持英文、中文以及 Unicode, IK, ICU等多种分词
+
+**2. 加速普通等值、范围查询，覆盖原来 BITMAP 索引的功能，代替 BITMAP 索引**
+
+- 支持字符串、数值、日期时间类型的 =, !=, >, >=, <, <= 快速过滤
+
+- 支持字符串、数字、日期时间数组类型的 =, !=, >, >=, <, <=
+
+**3. 支持完善的逻辑组合**
+
+- 不仅支持 AND 条件加速，还支持 OR NOT 条件加速
+
+- 支持多个条件的任意 AND OR NOT 逻辑组合
+
+**4. 灵活高效的索引管理**
+
+- 支持在创建表上定义倒排索引
+
+- 支持在已有的表上增加倒排索引，而且支持增量构建倒排索引，无需重写表中的已有数据
+
+- 支持删除已有表上的倒排索引，无需重写表中的已有数据
+
+:::tip
+
+倒排索引的使用有下面一些限制：
+
+1. 存在精度问题的浮点数类型 FLOAT 和 DOUBLE 不支持倒排索引，原因是浮点数精度不准确。解决方案是使用精度准确的定点数类型 
DECIMAL，DECIMAL 支持倒排索引。
+
+2. 部分复杂数据类型还不支持倒排索引，包括：MAP、STRUCT、JSON、HLL、BITMAP、QUANTILE_STATE、AGG_STATE。其中 
MAP、STRUCT 会逐步支持，JSON 类型可以换成 VARIANT 类型获得支持。其他几个类型因为其特殊用途暂不需要支持倒排索引。
+
+3. DUPLICATE 和 开启 Merge-on-Write 的 UNIQUE 表模型支持任意列建倒排索引。但是 AGGREGATE 和 未开启 
Merge-on-Write 的 UNIQUE 模型仅支持 Key 列建倒排索引，非 Key 
列不能建倒排索引，这是因为这两个模型需要读取所有数据后做合并，因此不能利用索引做提前过滤。
+
+:::
+
+## 管理索引
+
+### 建表时定义倒排索引
+
+在建表语句中 COLUMN 的定义之后是索引定义：
+
+```sql
+CREATE TABLE table_name
+(
+  column_name1 TYPE1,
+  column_name2 TYPE2,
+  column_name3 TYPE3,
+  INDEX idx_name1(column_name1) USING INVERTED [PROPERTIES(...)] [COMMENT 
'your comment'],
+  INDEX idx_name2(column_name2) USING INVERTED [PROPERTIES(...)] [COMMENT 
'your comment']
+)
+table_properties;
+```
+
+语法说明如下：
+
+**1. `idx_column_name(column_name)` 是必须的，`column_name` 
是建索引的列名，必须是前面列定义中出现过的，`idx_column_name` 是索引名字，必须表级别唯一，建议命名规范：列名前面加前缀 `idx_`**
+
+**2. `USING INVERTED` 是必须的，用于指定索引类型是倒排索引**
+
+**3. `PROPERTIES` 是可选的，用于指定倒排索引的额外属性，目前支持的属性如下：**
+
+<details>
+  <summary>parser 指定分词器</summary>
+
+  - 默认不指定表示不分词
+  - `english`：英文分词，适合英文文本；使用空格和标点分词，性能高
+  - `chinese`：中文分词，适合中文文本；性能较 `english` 略低
+  - `unicode`：多语言分词，适用于中英文混合/多语言文本；可对邮箱前后缀、IP 地址、字母数字混合进行分词，并支持中文按字符分词
+  - `icu`（自 3.1.0 起支持）：ICU 分词，基于 ICU 库，适用于国际化文本与复杂书写系统（如阿拉伯语、泰语等）
+  - `basic`（自 3.1.0 
起支持）：基本规则分词器；连续字母数字为一词，中文按字切分，忽略标点/空格/特殊字符；性能最高但分词规则更简单，日志场景中可作为替换unicode分词器。
+  - `ik`（自 3.1.0 起支持）：IK 中文分词，适用于中文文本分析
+
+  分词效果可通过 `TOKENIZE` SQL 函数验证，详见后续章节。
+
+</details>
+
+<details>
+  <summary>parser_mode</summary>
+
+  用于指定分词模式（`parser = chinese` 时可用）：
+
+  - `fine_grained`：细粒度，倾向于分出更短、更多的词；如“武汉市长江大桥”→“武汉”“武汉市”“市长”“长江”“长江大桥”“大桥”
+  - `coarse_grained`：粗粒度，倾向于分出更长、更少的词；如“武汉市长江大桥”→“武汉市”“长江大桥”
+  - 默认：`coarse_grained`
+
+</details>
+
+<details>
+  <summary>support_phrase</summary>
+
+  是否支持 `MATCH_PHRASE` 短语查询加速：
+
+  - `true`：支持，但索引占用更多存储空间
+  - `false`：不支持，更省存储；可用 `MATCH_ALL` 查询多个关键词
+  - 自 2.0.14、2.1.5、3.0.1 起：若设置了 `parser`，默认 `true`；否则默认 `false`
+
+  示例：中文分词 + 粗粒度模式 + 支持短语加速：
+
+```sql
+INDEX idx_name(column_name) USING INVERTED PROPERTIES(
+  "parser" = "chinese",
+  "parser_mode" = "coarse_grained",
+  "support_phrase" = "true"
+)
+```
+
+</details>
+
+<details>
+  <summary>char_filter</summary>
+
+  在分词前对文本进行预处理，影响分词行为。
+
+  - `char_filter_type`：char_filter 类型（目前仅支持 `char_replace`）
+  - `char_replace`：将 `pattern` 中每个字符替换为 `replacement` 中的字符
+  - `char_filter_pattern`：需要替换的字符
+  - `char_filter_replacement`：替换后的字符（可选，默认空格）
+
+  示例：将点和下划线替换为空格，使其作为词分隔符。
+
+```sql
+INDEX idx_name(column_name) USING INVERTED PROPERTIES(
+  "parser" = "unicode",
+  "char_filter_type" = "char_replace",
+  "char_filter_pattern" = "._",
+  "char_filter_replacement" = " "
+)
+```
+
+</details>
+
+<details>
+  <summary>ignore_above</summary>
+
+  指定不分词字符串索引（未指定 `parser`）的长度限制。
+
+  - 长度超过 `ignore_above` 的字符串不会被索引；对字符串数组，该限制分别作用于每个元素
+  - 默认值：`256`（字节）
+
+</details>
+
+<details>
+  <summary>lower_case</summary>
+
+  是否将分词结果转换为小写，以便实现不区分大小写匹配。
+
+  - `true`：转换为小写
+  - `false`：不转换
+  - 自 2.0.7 和 2.1.2 起默认 `true`；更早版本默认 `false`
+
+</details>
+
+<details>
+  <summary>stopwords</summary>
+
+  指定停用词表，会影响分词器行为。
+
+  - 内置停用词表包含常见无意义词（如 `is`、`the`、`a` 等），写入或查询时将被忽略
+  - `none`：使用空的停用词表
+
+</details>
+
+<details>
+  <summary>dict_compression（自 3.1.0 起支持）</summary>
+
+  是否对倒排索引的词典启用 ZSTD 字典压缩。
+
+  - `true`：启用字典压缩
+  - `false`：默认，不启用
+  - 建议：对大文本/日志或重视存储成本的场景启用；与 `inverted_index_storage_format = "V3"` 
搭配效果最佳，对大规模文本与日志场景可减少约 20% 存储。
+
+  示例：
+
+```sql
+INDEX idx_name(column_name) USING INVERTED PROPERTIES(
+  "parser" = "english",
+  "dict_compression" = "true"
+)
+```
+
+</details>
+
+**4. `COMMENT` 是可选的，用于指定索引注释**
+
+**5. 表级属性 `inverted_index_storage_format`**
+
+  **inverted_index_storage_format 取值：**
+  - "V1": 每个索引一个独立的idx文件
+  - "V2": 所有索引统一一个idx文件，有效降低索引文件读写的IO
+  - "V3": 当前最新的存储格式，和V2格式类似，但是具有优化的索引文件压缩能力。(该功能自 3.1.0 版本开始支持，4.0.1开始作为默认存储格式)
+
+  与 V2 相比，V3 提供：
+
+    1. 对词典启用 ZSTD 字典压缩(当 dict_compression 启用时)
+    2. 对每个词关联的位置信息进行压缩
+    3. 使用建议：对于大规模文本数据和日志分析场景，建议使用V3，可节省约20%的存储空间。
+
+要使用新的 V3 存储格式，在建表时指定此属性：
+
+```sql
+CREATE TABLE table_name (
+    column_name TEXT,
+    INDEX idx_name(column_name) USING INVERTED PROPERTIES("parser" = 
"english", "dict_compression" = "true")
+) PROPERTIES (
+    "inverted_index_storage_format" = "V3"
+);
+```
+
+### 已有表增加倒排索引
+
+
+**1. ADD INDEX**
+
+支持`CREATE INDEX` 和 `ALTER TABLE ADD INDEX` 两种语法，参数跟建表时索引定义相同
+
+```sql
+-- 语法 1
+CREATE INDEX idx_name ON table_name(column_name) USING INVERTED 
[PROPERTIES(...)] [COMMENT 'your comment'];
+-- 语法 2
+ALTER TABLE table_name ADD INDEX idx_name(column_name) USING INVERTED 
[PROPERTIES(...)] [COMMENT 'your comment'];
+```
+
+**2. BUILD INDEX**
+
+`CREATE / ADD INDEX` 操作只是新增了索引定义，这个操作之后的新写入数据会生成倒排索引，而存量数据需要使用 `BUILD INDEX` 
触发：
+
+```sql
+-- 语法 1，默认给全表的所有分区 BUILD INDEX
+BUILD INDEX index_name ON table_name;
+-- 语法 2，可指定 Partition，可指定一个或多个
+BUILD INDEX index_name ON table_name PARTITIONS(partition_name1, 
partition_name2);
+```
+
+通过 `SHOW BUILD INDEX` 查看 `BUILD INDEX` 进度：
+```sql
+SHOW BUILD INDEX [FROM db_name];
+-- 示例 1，查看所有的 BUILD INDEX 任务进展
+SHOW BUILD INDEX;
+-- 示例 2，查看指定 table 的 BUILD INDEX 任务进展
+SHOW BUILD INDEX where TableName = "table1";
+```
+
+通过 `CANCEL BUILD INDEX` 取消 `BUILD INDEX`：
+```sql
+CANCEL BUILD INDEX ON table_name;
+CANCEL BUILD INDEX ON table_name (job_id1,jobid_2,...);
+```
+
+:::tip
+
+`BUILD INDEX` 会生成一个异步任务执行，在每个 BE 上有多个线程执行索引构建任务，通过 BE 参数 
`alter_index_worker_count` 可以设置，默认值是 3。
+
+2.0.12 和 2.1.4 之前的版本 `BUILD INDEX` 会一直重试直到成功，从这两个版本开始通过失败和超时机制避免一直重试。3.0 
存算分离模式暂不支持此命令。
+
+1. 一个 tablet 的多数副本 `BUILD INDEX` 失败后，整个 `BUILD INDEX` 失败结束
+2. 时间超过 `alter_table_timeout_second` ()，`BUILD INDEX` 超时结束
+3. 用户可以多次触发 `BUILD INDEX`，已经 BUILD 成功的索引不会重复 BUILD
+
+:::
+
+
+### 已有表删除倒排索引
+
+```sql
+-- 语法 1
+DROP INDEX idx_name ON table_name;
+-- 语法 2
+ALTER TABLE table_name DROP INDEX idx_name;
+```
+
+:::tip
+
+`DROP INDEX` 会删除索引定义，新写入数据不会再写索引，同时会生成一个异步任务执行索引删除操作，在每个 BE 上有多个线程执行索引删除任务，通过 
BE 参数 `alter_index_worker_count` 可以设置，默认值是 3。
+
+:::
+
+### 查看倒排索引
+
+```sql
+-- 语法 1，表的 schema 中 INDEX 部分 USING INVERTED 是倒排索引
+SHOW CREATE TABLE table_name;
+
+-- 语法 2，IndexType 为 INVERTED 的是倒排索引
+SHOW INDEX FROM idx_name;
+```
+
+
+## 使用索引
+
+### 利用倒排索引加速查询
+
+```sql
+-- 1. 全文检索关键词匹配，通过 MATCH_ANY MATCH_ALL 完成
+SELECT * FROM table_name WHERE column_name MATCH_ANY | MATCH_ALL 'keyword1 
...';
+
+-- 1.1 content 列中包含 keyword1 的行
+SELECT * FROM table_name WHERE content MATCH_ANY 'keyword1';
+
+-- 1.2 content 列中包含 keyword1 或者 keyword2 的行，后面还可以添加多个 keyword
+SELECT * FROM table_name WHERE content MATCH_ANY 'keyword1 keyword2';
+
+-- 1.3 content 列中同时包含 keyword1 和 keyword2 的行，后面还可以添加多个 keyword
+SELECT * FROM table_name WHERE content MATCH_ALL 'keyword1 keyword2';
+
+
+-- 2. 全文检索短语匹配，通过 MATCH_PHRASE 完成
+-- 2.1 content 列中同时包含 keyword1 和 keyword2 的行，而且 keyword2 必须紧跟在 keyword1 后面
+-- 'keyword1 keyword2'，'wordx keyword1 keyword2'，'wordx keyword1 keyword2 
wordy' 能匹配，因为他们都包含 keyword1 keyword2，而且 keyword2 紧跟在 keyword1 后面
+-- 'keyword1 wordx keyword2' 不能匹配，因为 keyword1 keyword2 之间隔了一个词 wordx
+-- 'keyword2 keyword1'，因为 keyword1 keyword2 的顺序反了
+-- 使用 MATCH_PHRASE 需要再 PROPERTIES 中开启 "support_phrase" = "true"
+SELECT * FROM table_name WHERE content MATCH_PHRASE 'keyword1 keyword2';
+
+-- 2.2 content 列中同时包含 keyword1 和 keyword2 的行，而且 keyword1 keyword2 的 
`词距`（slop）不超过 3
+-- 'keyword1 keyword2', 'keyword1 a keyword2', 'keyword1 a b c keyword2' 
都能匹配，因为 keyword1 keyword2 中间隔的词分别是 0 1 3 都不超过 3
+-- 'keyword1 a b c d keyword2' 不能能匹配，因为 keyword1 keyword2 中间隔的词有 4 个，超过 3
+-- 'keyword2 keyword1', 'keyword2 a keyword1', 'keyword2 a b c keyword1' 
也能匹配，因为指定 slop > 0 时不再要求 keyword1 keyword2 的顺序。这个行为参考了 ES，与直觉的预期不一样，因此 Doris 
提供了在 slop 后面指定正数符号（+）表示需要保持 keyword1 keyword2 的先后顺序
+SELECT * FROM table_name WHERE content MATCH_PHRASE 'keyword1 keyword2 ~3';
+-- slop 指定正号，'keyword1 a b c keyword2' 能匹配，而 'keyword2 a b c keyword1' 不能匹配
+SELECT * FROM table_name WHERE content MATCH_PHRASE 'keyword1 keyword2 ~3+';
+
+-- 2.3 在保持词顺序的前提下，对最后一个词 keyword2 做前缀匹配，默认找 50 个前缀词（session 变量 
inverted_index_max_expansions 控制）
+-- 需要保证 keyword1, keyword2 在原文分词后也是相邻的，不能中间有其他词
+-- 'keyword1 keyword2abc' 能匹配，因为 keyword1 完全一样，最后一个 keyword2abc 是 keyword2 的前缀
+-- 'keyword1 keyword2' 也能匹配，因为 keyword2 也是 keyword2 的前缀
+-- 'keyword1 keyword3' 不能匹配，因为 keyword3 不是 keyword2 的前缀
+-- 'keyword1 keyword3abc' 也不能匹配，因为 keyword3abc 也不是 keyword2 的前缀
+SELECT * FROM table_name WHERE content MATCH_PHRASE_PREFIX 'keyword1 keyword2';
+
+-- 2.4 如果只填一个词会退化为前缀查询，默认找 50 个前缀词（session 变量 inverted_index_max_expansions 控制）
+SELECT * FROM table_name WHERE content MATCH_PHRASE_PREFIX 'keyword1';
+
+-- 2.5 对分词后的词进行正则匹配，默认匹配 50 个（session 变量 inverted_index_max_expansions 控制）
+-- 类似 MATCH_PHRASE_PREFIX 的匹配规则，只是前缀变成了正则
+SELECT * FROM table_name WHERE content MATCH_REGEXP 'key.*';
+
+-- 3. 普通等值、范围、IN、NOT IN，正常的 SQL 语句即可，例如
+SELECT * FROM table_name WHERE id = 123;
+SELECT * FROM table_name WHERE ts > '2023-01-01 00:00:00';
+SELECT * FROM table_name WHERE op_type IN ('add', 'delete');
+
+-- 4. 多列全文检索匹配，通过 multi_match 函数完成
+-- 参数说明：
+--   前N个参数是要匹配的列名
+--   倒数第二个参数指定匹配模式：'any'/'all'/'phrase'/'phrase_prefix'
+--   最后一个参数是要搜索的关键词或短语
+
+-- 4.1 在col1,col2,col3任意一列中包含'keyword1'的行（OR逻辑）
+SELECT * FROM table_name WHERE multi_match(col1, col2, col3, 'any', 
'keyword1');
+
+-- 4.2 在col1,col2,col3所有列中都包含'keyword1'的行（AND逻辑）
+SELECT * FROM table_name WHERE multi_match(col1, col2, col3, 'all', 
'keyword1');
+
+-- 4.3 在col1,col2,col3任意一列中包含完整短语'keyword1'的行（精确短语匹配）
+SELECT * FROM table_name WHERE multi_match(col1, col2, col3, 'phrase', 
'keyword1');
+
+-- 4.4 在col1,col2,col3任意一列中包含以'keyword1'开头的短语的行（短语前缀匹配）
+-- 例如会匹配"keyword123"这样的内容
+SELECT * FROM table_name WHERE multi_match(col1, col2, col3, 'phrase_prefix', 
'keyword1');
+```
+
+### 通过 profile 分析索引加速效果
+
+倒排查询加速可以通过 session 变量 `enable_inverted_index_query` 开关，默认是 true 
打开，有时为了验证索引加速效果可以设置为 false 关闭。
+
+可以通过 Query Profile 中的下面几个指标分析倒排索引的加速效果。
+- RowsInvertedIndexFiltered 倒排过滤掉的行数，可以与其他几个 Rows 值对比分析索引过滤效果
+- InvertedIndexFilterTime 倒排索引消耗的时间
+  - InvertedIndexSearcherOpenTime 倒排索引打开索引的时间
+  - InvertedIndexSearcherSearchTime 倒排索引内部查询的时间
+
+
+### 用分词函数验证分词效果
+
+如果想检查分词实际效果或者对一段文本进行分词行为，可以使用 TOKENIZE 函数进行验证。
+
+TOKENIZE 函数的第一个参数是待分词的文本，第二个参数是创建索引指定的分词参数。
+
+```sql
+SELECT TOKENIZE('武汉长江大桥','"parser"="chinese","parser_mode"="fine_grained"');
++-----------------------------------------------------------------------------------+
+| tokenize('武汉长江大桥', '"parser"="chinese","parser_mode"="fine_grained"')       |
++-----------------------------------------------------------------------------------+
+| ["武汉", "武汉长江大桥", "长江", "长江大桥", "大桥"]                              |
++-----------------------------------------------------------------------------------+
+
+SELECT TOKENIZE('武汉市长江大桥','"parser"="chinese","parser_mode"="fine_grained"');
++--------------------------------------------------------------------------------------+
+| tokenize('武汉市长江大桥', '"parser"="chinese","parser_mode"="fine_grained"')       
 |
++--------------------------------------------------------------------------------------+
+| ["武汉", "武汉市", "市长", "长江", "长江大桥", "大桥"]                               |
++--------------------------------------------------------------------------------------+
+
+SELECT TOKENIZE('武汉市长江大桥','"parser"="chinese","parser_mode"="coarse_grained"');
++----------------------------------------------------------------------------------------+
+| tokenize('武汉市长江大桥', '"parser"="chinese","parser_mode"="coarse_grained"')     
   |
++----------------------------------------------------------------------------------------+
+| ["武汉市", "长江大桥"]                                                              
   |
++----------------------------------------------------------------------------------------+
+
+SELECT TOKENIZE('I love Doris','"parser"="english"');
++------------------------------------------------+
+| tokenize('I love Doris', '"parser"="english"') |
++------------------------------------------------+
+| ["i", "love", "doris"]                         |
++------------------------------------------------+
+
+SELECT TOKENIZE('I love CHINA 我爱我的祖国','"parser"="unicode"');
++-------------------------------------------------------------------+
+| tokenize('I love CHINA 我爱我的祖国', '"parser"="unicode"')       |
++-------------------------------------------------------------------+
+| ["i", "love", "china", "我", "爱", "我", "的", "祖", "国"]        |
++-------------------------------------------------------------------+
+
+-- ICU 分词多语言文本 (该功能自 3.1.0 版本开始支持)
+SELECT TOKENIZE('مرحبا بالعالم Hello 世界', '"parser"="icu"');
++--------------------------------------------------------+
+| tokenize('مرحبا بالعالم Hello 世界', '"parser"="icu"') |
++--------------------------------------------------------+
+| ["مرحبا", "بالعالم", "Hello", "世界"]                   |
++--------------------------------------------------------+
+
+SELECT TOKENIZE('มนไมเปนไปตามความตองการ', '"parser"="icu"');
++-------------------------------------------------------------------+
+| tokenize('มนไมเปนไปตามความตองการ', '"parser"="icu"')            |
++-------------------------------------------------------------------+
+| ["มน", "ไมเปน", "ไป", "ตาม", "ความ", "ตองการ"]                  |
++-------------------------------------------------------------------+
+
+-- Basic 分词高性能场景 (该功能自 3.1.0 版本开始支持)
+SELECT TOKENIZE('Hello World! This is a test.', '"parser"="basic"');
++-----------------------------------------------------------+
+| tokenize('Hello World! This is a test.', '"parser"="basic"') |
++-----------------------------------------------------------+
+| ["hello", "world", "this", "is", "a", "test"]              |
++-----------------------------------------------------------+
+
+SELECT TOKENIZE('你好世界', '"parser"="basic"');
++-------------------------------------------+
+| tokenize('你好世界', '"parser"="basic"')   |
++-------------------------------------------+
+| ["你", "好", "世", "界"]                    |
++-------------------------------------------+
+
+SELECT TOKENIZE('Hello你好World世界', '"parser"="basic"');
++------------------------------------------------------+
+| tokenize('Hello你好World世界', '"parser"="basic"')    |
++------------------------------------------------------+
+| ["hello", "你", "好", "world", "世", "界"]             |
++------------------------------------------------------+
+
+SELECT TOKENIZE('GET /images/hm_bg.jpg HTTP/1.0', '"parser"="basic"');
++---------------------------------------------------------------------+
+| tokenize('GET /images/hm_bg.jpg HTTP/1.0', '"parser"="basic"')      |
++---------------------------------------------------------------------+
+| ["get", "images", "hm", "bg", "jpg", "http", "1", "0"]              |
++---------------------------------------------------------------------+
+```
+
+## 使用示例
+
+用 HackerNews 100 万条数据展示倒排索引的创建、全文检索、普通查询，包括跟无索引的查询性能进行简单对比。
+
+### 建表
+
+```sql
+
+CREATE DATABASE test_inverted_index;
+
+USE test_inverted_index;
+
+-- 创建表的同时创建了 comment 的倒排索引 idx_comment
+--   USING INVERTED 指定索引类型是倒排索引
+--   PROPERTIES("parser" = "english") 指定采用 "english" 分词，还支持 "chinese" 中文分词和 
"unicode" 中英文多语言混合分词，如果不指定 "parser" 参数表示不分词
+
+CREATE TABLE hackernews_1m
+(
+    `id` BIGINT,
+    `deleted` TINYINT,
+    `type` String,
+    `author` String,
+    `timestamp` DateTimeV2,
+    `comment` String,
+    `dead` TINYINT,
+    `parent` BIGINT,
+    `poll` BIGINT,
+    `children` Array<BIGINT>,
+    `url` String,
+    `score` INT,
+    `title` String,
+    `parts` Array<INT>,
+    `descendants` INT,
+    INDEX idx_comment (`comment`) USING INVERTED PROPERTIES("parser" = 
"english") COMMENT 'inverted index for comment'
+)
+DUPLICATE KEY(`id`)
+DISTRIBUTED BY HASH(`id`) BUCKETS 10
+PROPERTIES ("replication_num" = "1");
+
+```
+
+
+### 导入数据
+
+**通过 Stream Load 导入数据**
+
+```
+wget 
https://qa-build.oss-cn-beijing.aliyuncs.com/regression/index/hacknernews_1m.csv.gz
+
+curl --location-trusted -u root: -H "compress_type:gz" -T 
hacknernews_1m.csv.gz  
http://127.0.0.1:8030/api/test_inverted_index/hackernews_1m/_stream_load
+{
+    "TxnId": 2,
+    "Label": "a8a3e802-2329-49e8-912b-04c800a461a6",
+    "TwoPhaseCommit": "false",
+    "Status": "Success",
+    "Message": "OK",
+    "NumberTotalRows": 1000000,
+    "NumberLoadedRows": 1000000,
+    "NumberFilteredRows": 0,
+    "NumberUnselectedRows": 0,
+    "LoadBytes": 130618406,
+    "LoadTimeMs": 8988,
+    "BeginTxnTimeMs": 23,
+    "StreamLoadPutTimeMs": 113,
+    "ReadDataTimeMs": 4788,
+    "WriteDataTimeMs": 8811,
+    "CommitAndPublishTimeMs": 38
+}
+```
+
+**SQL 运行 count() 确认导入数据成功**
+
+```sql
+SELECT count() FROM hackernews_1m;
++---------+
+| count() |
++---------+
+| 1000000 |
++---------+
+```
+
+### 查询
+
+**01 全文检索**
+
+- 用 `LIKE` 匹配计算 comment 中含有 'OLAP' 的行数，耗时 0.18s
+
+  ```sql
+  SELECT count() FROM hackernews_1m WHERE comment LIKE '%OLAP%';
+  +---------+
+  | count() |
+  +---------+
+  |      34 |
+  +---------+
+  ```
+
+- 用基于倒排索引的全文检索 `MATCH_ANY` 计算 comment 中含有'OLAP'的行数，耗时 0.02s，加速 9 
倍，在更大的数据集上效果会更加明显
+
+  
+  这里结果条数的差异，是因为倒排索引对 comment 分词后，还会对词进行进行统一成小写等归一化处理，因此 `MATCH_ANY` 比 `LIKE` 
的结果多一些
+
+  ```sql
+  SELECT count() FROM hackernews_1m WHERE comment MATCH_ANY 'OLAP';
+  +---------+
+  | count() |
+  +---------+
+  |      35 |
+  +---------+
+  ```
+
+- 同样的对比统计 'OLTP' 出现次数的性能，0.07s vs 0.01s，由于缓存的原因 `LIKE` 和 `MATCH_ANY` 
都有提升，倒排索引仍然有 7 倍加速
+
+  ```sql
+  SELECT count() FROM hackernews_1m WHERE comment LIKE '%OLTP%';
+  +---------+
+  | count() |
+  +---------+
+  |      48 |
+  +---------+
+
+
+  SELECT count() FROM hackernews_1m WHERE comment MATCH_ANY 'OLTP';
+  +---------+
+  | count() |
+  +---------+
+  |      51 |
+  +---------+
+  ```
+
+- 同时出现 'OLAP' 和 'OLTP' 两个词，0.13s vs 0.01s，13 倍加速
+
+  要求多个词同时出现时（AND 关系）使用 `MATCH_ALL` 'keyword1 keyword2 ...'
+
+  ```sql
+  SELECT count() FROM hackernews_1m WHERE comment LIKE '%OLAP%' AND comment 
LIKE '%OLTP%';
+  +---------+
+  | count() |
+  +---------+
+  |      14 |
+  +---------+
+
+
+  SELECT count() FROM hackernews_1m WHERE comment MATCH_ALL 'OLAP OLTP';
+  +---------+
+  | count() |
+  +---------+
+  |      15 |
+  +---------+
+  ```
+
+- 任意出现 'OLAP' 和 'OLTP' 其中一个词，0.12s vs 0.01s，12 倍加速
+  
+  只要求多个词任意一个或多个出现时（OR 关系）使用 `MATCH_ANY` 'keyword1 keyword2 ...'
+
+  ```sql
+  SELECT count() FROM hackernews_1m WHERE comment LIKE '%OLAP%' OR comment 
LIKE '%OLTP%';
+  +---------+
+  | count() |
+  +---------+
+  |      68 |
+  +---------+
+  
+  SELECT count() FROM hackernews_1m WHERE comment MATCH_ANY 'OLAP OLTP';
+  +---------+
+  | count() |
+  +---------+
+  |      71 |
+  +---------+
+  ```
+
+
+**02 普通等值、范围查询**
+
+- DataTime 类型的列范围查询
+
+  ```sql
+  SELECT count() FROM hackernews_1m WHERE timestamp > '2007-08-23 04:17:00';
+  +---------+
+  | count() |
+  +---------+
+  |  999081 |
+  +---------+
+  ```
+
+- 为 timestamp 列增加一个倒排索引
+
+  ```sql
+  -- 对于日期时间类型 USING INVERTED，不用指定分词
+  -- CREATE INDEX 是第一种建索引的语法，另外一种在后面展示
+  CREATE INDEX idx_timestamp ON hackernews_1m(timestamp) USING INVERTED;
+  ```
+
+  ```sql
+  BUILD INDEX idx_timestamp ON hackernews_1m;
+  ```
+
+- 查看索引创建进度，通过 FinishTime 和 CreateTime 的差值，可以看到 100 万条数据对 timestamp 列建倒排索引只用了 1s
+
+  ```sql
+  SHOW ALTER TABLE COLUMN;
+  
+-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
+  | JobId | TableName     | CreateTime              | FinishTime              
| IndexName     | IndexId | OriginIndexId | SchemaVersion | TransactionId | 
State    | Msg  | Progress | Timeout |
+  
+-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
+  | 10030 | hackernews_1m | 2023-02-10 19:44:12.929 | 2023-02-10 19:44:13.938 
| hackernews_1m | 10031   | 10008         | 1:1994690496  | 3             | 
FINISHED |      | NULL     | 2592000 |
+  
+-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
+  ```
+
+  ```sql
+  -- 若 table 没有分区，PartitionName 默认就是 TableName
+  SHOW BUILD INDEX;
+  
+-------+---------------+---------------+----------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+  | JobId | TableName     | PartitionName | AlterInvertedIndexes               
                      | CreateTime              | FinishTime              | 
TransactionId | State    | Msg  | Progress |
+  
+-------+---------------+---------------+----------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+  | 10191 | hackernews_1m | hackernews_1m | [ADD INDEX idx_timestamp 
(`timestamp`) USING INVERTED],  | 2023-06-26 15:32:33.894 | 2023-06-26 
15:32:34.847 | 3             | FINISHED |      | NULL     |
+  
+-------+---------------+---------------+----------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+  ```
+
+- 索引创建后，范围查询用同样的查询方式，Doris 会自动识别索引进行优化，但是这里由于数据量小性能差别不大
+
+  ```sql
+  SELECT count() FROM hackernews_1m WHERE timestamp > '2007-08-23 04:17:00';
+  +---------+
+  | count() |
+  +---------+
+  |  999081 |
+  +---------+
+  ```
+
+- 在数值类型的列 Parent 进行类似 timestamp 的操作，这里查询使用等值匹配
+
+  ```sql
+  SELECT count() FROM hackernews_1m WHERE parent = 11189;
+  +---------+
+  | count() |
+  +---------+
+  |       2 |
+  +---------+
+
+
+  -- 对于数值类型 USING INVERTED，不用指定分词
+  -- ALTER TABLE t ADD INDEX 是第二种建索引的语法
+  ALTER TABLE hackernews_1m ADD INDEX idx_parent(parent) USING INVERTED;
+
+
+  -- 执行 BUILD INDEX 给存量数据构建倒排索引
+  BUILD INDEX idx_parent ON hackernews_1m;
+
+
+  SHOW ALTER TABLE COLUMN;
+  
+-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
+  | JobId | TableName     | CreateTime              | FinishTime              
| IndexName     | IndexId | OriginIndexId | SchemaVersion | TransactionId | 
State    | Msg  | Progress | Timeout |
+  
+-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
+  | 10030 | hackernews_1m | 2023-02-10 19:44:12.929 | 2023-02-10 19:44:13.938 
| hackernews_1m | 10031   | 10008         | 1:1994690496  | 3             | 
FINISHED |      | NULL     | 2592000 |
+  | 10053 | hackernews_1m | 2023-02-10 19:49:32.893 | 2023-02-10 19:49:33.982 
| hackernews_1m | 10054   | 10008         | 1:378856428   | 4             | 
FINISHED |      | NULL     | 2592000 |
+  
+-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
+
+  SHOW BUILD INDEX;
+  
+-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+  | JobId | TableName     | PartitionName | AlterInvertedIndexes               
                | CreateTime              | FinishTime              | 
TransactionId | State    | Msg  | Progress |
+  
+-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+  | 11005 | hackernews_1m | hackernews_1m | [ADD INDEX idx_parent (`parent`) 
USING INVERTED],  | 2023-06-26 16:25:10.167 | 2023-06-26 16:25:10.838 | 1002    
      | FINISHED |      | NULL     |
+  
+-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+
+
+  SELECT count() FROM hackernews_1m WHERE parent = 11189;
+  +---------+
+  | count() |
+  +---------+
+  |       2 |
+  +---------+
+  ```
+
+- 对字符串类型的 `author` 建立不分词的倒排索引，等值查询也可以利用索引加速
+
+  ```sql
+  SELECT count() FROM hackernews_1m WHERE author = 'faster';
+  +---------+
+  | count() |
+  +---------+
+  |      20 |
+  +---------+
+  
+  -- 这里只用了 USING INVERTED，不对 author 分词，整个当做一个词处理
+  ALTER TABLE hackernews_1m ADD INDEX idx_author(author) USING INVERTED;
+
+  
+  -- 执行 BUILD INDEX 给存量数据加上倒排索引：
+  BUILD INDEX idx_author ON hackernews_1m;
+
+  
+  -- 100 万条 author 数据增量建索引仅消耗 1.5s
+  SHOW ALTER TABLE COLUMN;
+  
+-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
+  | JobId | TableName     | CreateTime              | FinishTime              
| IndexName     | IndexId | OriginIndexId | SchemaVersion | TransactionId | 
State    | Msg  | Progress | Timeout |
+  
+-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
+  | 10030 | hackernews_1m | 2023-02-10 19:44:12.929 | 2023-02-10 19:44:13.938 
| hackernews_1m | 10031   | 10008         | 1:1994690496  | 3             | 
FINISHED |      | NULL     | 2592000 |
+  | 10053 | hackernews_1m | 2023-02-10 19:49:32.893 | 2023-02-10 19:49:33.982 
| hackernews_1m | 10054   | 10008         | 1:378856428   | 4             | 
FINISHED |      | NULL     | 2592000 |
+  | 10076 | hackernews_1m | 2023-02-10 19:54:20.046 | 2023-02-10 19:54:21.521 
| hackernews_1m | 10077   | 10008         | 1:1335127701  | 5             | 
FINISHED |      | NULL     | 2592000 |
+  
+-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
+  
+  SHOW BUILD INDEX order by CreateTime desc limit 1;
+  
+-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+  | JobId | TableName     | PartitionName | AlterInvertedIndexes               
                | CreateTime              | FinishTime              | 
TransactionId | State    | Msg  | Progress |
+  
+-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+  | 13006 | hackernews_1m | hackernews_1m | [ADD INDEX idx_author (`author`) 
USING INVERTED],  | 2023-06-26 17:23:02.610 | 2023-06-26 17:23:03.755 | 3004    
      | FINISHED |      | NULL     |
+  
+-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+
+  
+  -- 创建索引后，字符串等值匹配也有明显加速
+  SELECT count() FROM hackernews_1m WHERE author = 'faster';
+  +---------+
+  | count() |
+  +---------+
+  |      20 |
+  +---------+
+  ```
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/inverted-index/scoring.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/inverted-index/scoring.md
new file mode 100644
index 00000000000..6dab0d33ee7
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/inverted-index/scoring.md
@@ -0,0 +1,106 @@
+---
+{
+"title": "文本检索打分",
+"language": "zh"
+}
+---
+
+## 概述
+
+文本检索打分用于衡量表中某一行数据与查询文本之间的相关程度。
+当执行包含全文检索条件的查询（例如 `MATCH_ANY`、`MATCH_ALL` 等）时，Doris 
会为每一行返回一个数值型的打分结果，表示该行与查询条件的匹配强度。
+该打分值可用于结果排序，以便优先返回与查询最相关的内容。
+
+当前 Doris 使用 **BM25（Best Matching 25）** 算法进行文本相关性计算。
+
+## BM25 算法简介
+
+BM25 是一种基于概率模型的文本相关性算法。
+它通过综合考虑词频、逆文档频率以及记录长度，对匹配结果进行加权计算。
+BM25 相比传统 TF-IDF 模型具有更好的鲁棒性和可调性，能有效平衡长文本与短文本的得分差异。
+
+### 算法公式
+
+BM25 的核心计算公式如下：
+
+```
+score = IDF × (tf × (k1 + 1)) / (tf + k1 × (1 - b + b × |d| / avgdl))
+```
+
+其中：
+
+* **tf**：查询词在当前行中的出现次数
+* **IDF**：逆文档频率（衡量词的稀有程度）
+* **|d|**：当前行的长度（即被分词后的词元数）
+* **avgdl**：表中所有行的平均长度
+* **k1, b**：算法调节参数
+
+**默认参数：**
+
+| 参数      | 默认值  | 说明           |
+| ------- | ---- | ------------ |
+| `k1`    | 1.2  | 控制词频对得分的影响程度 |
+| `b`     | 0.75 | 控制记录长度归一化的强度 |
+| `boost` | 1.0  | 查询级别的权重因子    |
+
+**辅助统计量：**
+
+```
+IDF = log(1 + (N - n + 0.5) / (n + 0.5))
+avgdl = total_terms / total_rows
+```
+
+其中：
+
+* `N` 表示表中总行数；
+* `n` 表示包含该查询词的行数。
+
+最终的总得分为所有查询词的单词得分之和。
+
+## 在 Doris 中使用打分
+
+### 支持的索引类型
+
+* **分词型倒排索引**：支持 BM25 打分计算。
+* **非分词型倒排索引**：仅支持精确匹配，不计算打分。
+
+### 支持的查询类型
+
+* `MATCH_ANY`
+* `MATCH_ALL`
+* `MATCH_PHRASE`
+* `MATCH_PHRASE_PREFIX`
+* `SEARCH`
+
+### 查询下推规则
+
+为启用打分计算，下推条件需满足：
+
+1. 查询语句的 `SELECT` 中包含 `score()` 函数；
+2. `WHERE` 子句中包含至少一个 `MATCH_*` 条件；
+3. 查询为 Top-N 类型，且 `ORDER BY` 子句基于 `score` 结果排序。
+
+---
+
+## 示例
+
+```sql
+SELECT *,
+       score() AS relevance
+FROM search_demo
+WHERE content MATCH_ANY '检索测试'
+ORDER BY relevance DESC
+LIMIT 10;
+```
+
+该语句会根据 BM25 打分结果返回前 10 条最相关的记录。
+
+---
+
+## 结果说明
+
+* **得分范围**：BM25 得分为正数，无固定上限。通常仅比较相对大小。
+* **多词查询**：若查询包含多个词项，总得分为各词项得分之和。
+* **长度影响**：较短记录在包含相同词项时得分略高。
+* **无匹配词项**：若查询词未出现在表中，对应得分为 0。
+
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/inverted-index/search-function.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/inverted-index/search-function.md
new file mode 100644
index 00000000000..a186fec582b
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/inverted-index/search-function.md
@@ -0,0 +1,220 @@
+---
+{
+    "title": "SEARCH 函数",
+    "language": "zh-CN"
+}
+---
+
+## 概述
+
+`SEARCH` 函数为 Apache Doris 4.0版本开始提供的统一的全文检索查询入口。它以简洁的 
DSL（领域特定语言）描述查询条件，并基于倒排索引高效执行。
+
+
+SEARCH 是一个返回布尔值的谓词函数，可作为过滤条件出现在 `WHERE`中。它接收一个 SEARCH DSL 
字符串用于描述文本匹配规则，并将可匹配条件下推至倒排索引执行。
+
+
+## 语法与语义
+
+语法
+
+```sql
+SEARCH('<search_expression>')
+```
+
+- 参数：`<search_expression>` — SEARCH DSL 查询表达式（字符串字面量）
+
+用法
+
+- 位置：用于 `WHERE`，作为谓词参与行过滤
+- 返回类型：BOOLEAN（匹配为 TRUE）
+
+### 当前支持语法
+
+#### 词项查询
+- 语法：`column:term`
+- 语义：在列的分词结果中匹配该词项；是否区分大小写取决于索引属性 `lower_case`
+- 索引建议：为该列创建带合适 `parser`/`analyzer` 的倒排索引
+```sql
+SELECT id, title FROM search_test_basic WHERE SEARCH('title:Machine');
+SELECT id, title FROM search_test_basic WHERE SEARCH('title:Python');
+SELECT id, title FROM search_test_basic WHERE SEARCH('category:Technology');
+```
+
+#### ANY 查询
+- 语法：`column:ANY(term1 term2 ...)`
+- 语义：列的分词结果中包含列表里任意一个词项即可（OR 语义）；顺序无关，重复词忽略
+- 索引建议：为该列创建分词倒排索引（如 `english`/`chinese`/`unicode`）
+```sql
+SELECT id, title FROM search_test_basic WHERE SEARCH('tags:ANY(python 
javascript)');
+SELECT id, title FROM search_test_basic WHERE SEARCH('tags:ANY(machine 
learning tutorial)');
+
+-- 边界：单值 ANY 等价于词项查询
+SELECT id, title FROM search_test_basic WHERE SEARCH('tags:ANY(python)');
+```
+
+#### ALL 查询
+- 语法：`column:ALL(term1 term2 ...)`
+- 语义：列的分词结果中同时包含列表里所有词项（AND 语义）；顺序无关，重复词忽略
+- 索引建议：为该列创建分词倒排索引（如 `english`/`chinese`/`unicode`）
+```sql
+SELECT id, title FROM search_test_basic WHERE SEARCH('tags:ALL(machine 
learning)');
+SELECT id, title FROM search_test_basic WHERE SEARCH('tags:ALL(programming 
tutorial)');
+
+-- 边界：单值 ALL 等价于词项查询
+SELECT id, title FROM search_test_basic WHERE SEARCH('tags:ALL(python)');
+```
+
+#### 布尔操作
+- 语法：`(expr) AND/OR/NOT (expr)`
+- 语义：在 SEARCH 内用 `AND`、`OR`、`NOT` 组合子表达式
+- 索引建议：尽量将可匹配条件写入 SEARCH 内部以获得索引下推；其他 WHERE 条件作为过滤
+```sql
+SELECT id, title FROM search_test_basic
+WHERE SEARCH('title:Machine AND category:Technology');
+
+SELECT id, title FROM search_test_basic
+WHERE SEARCH('title:Python OR title:Data');
+
+SELECT id, title FROM search_test_basic
+WHERE SEARCH('category:Technology AND NOT title:Machine');
+```
+
+#### 复杂嵌套表达式
+- 语法：使用括号对表达式分组（例如：`(expr1 OR expr2) AND expr3`）
+- 语义：通过括号控制布尔优先级，支持多层嵌套
+- 索引建议：同上
+```sql
+SELECT id, title FROM search_test_basic
+WHERE SEARCH('(title:Machine OR title:Python) AND category:Technology');
+
+SELECT id, title FROM search_test_basic
+WHERE SEARCH('tags:ANY(python javascript) AND (category:Technology OR 
category:Programming)');
+```
+
+#### 多列搜索
+- 语法：`column1:term OR column2:ANY(...) OR ...`
+- 语义：在单条表达式中跨多列匹配；每列按其索引/分词配置生效
+- 索引建议：为涉及到的每一列建立合适的倒排索引
+```sql
+SELECT id, title FROM search_test_basic
+WHERE SEARCH('title:Python OR tags:ANY(database mysql) OR author:Alice');
+
+SELECT id, title FROM search_test_basic
+WHERE SEARCH('tags:ALL(tutorial) AND category:Technology');
+```
+
+#### EXACT 查询（严格等值匹配）
+
+- 语法：`column:EXACT(text)`
+- 语义：按列的完整值进行精确匹配；区分大小写；不匹配部分词项
+- 索引建议：该列建议同时建立未分词倒排索引（不设置 `parser`），用于 EXACT 加速
+
+示例：
+
+```sql
+SELECT id
+FROM t
+WHERE SEARCH('content:EXACT(machine learning)');
+```
+
+#### Variant 子列查询
+
+- 语法：`variant_col.sub.path:term`
+- 语义：通过点号路径访问 VARIANT 子列进行匹配；匹配行为遵循该 VARIANT 列上索引/分析器的配置
+- 支持布尔组合、`ANY`/`ALL`、嵌套路径；不存在的子列不返回匹配
+
+示例：
+
+```sql
+SELECT id
+FROM test_variant_search_subcolumn
+WHERE SEARCH('properties.message:alpha');
+```
+
+### 示例
+
+```sql
+-- 同时建立分词与未分词倒排索引
+CREATE TABLE t (
+  id INT,
+  content STRING,
+  INDEX idx_untokenized(content) USING INVERTED,
+  INDEX idx_tokenized(content)  USING INVERTED PROPERTIES("parser" = 
"standard")
+);
+
+-- 严格等值匹配（使用未分词索引）
+SELECT id, content
+FROM t
+WHERE SEARCH('content:EXACT(machine learning)')
+ORDER BY id;
+
+-- EXACT 不匹配部分词项
+SELECT id, content
+FROM t
+WHERE SEARCH('content:EXACT(machine)')
+ORDER BY id;
+
+-- ANY/ALL 使用分词索引
+SELECT id, content FROM t WHERE SEARCH('content:ANY(machine learning)') ORDER 
BY id;
+SELECT id, content FROM t WHERE SEARCH('content:ALL(machine learning)') ORDER 
BY id;
+
+-- 对比 EXACT 与 ANY
+SELECT id, content FROM t WHERE SEARCH('content:EXACT(deep learning)') ORDER 
BY id;
+SELECT id, content FROM t WHERE SEARCH('content:ANY(deep learning)') ORDER BY 
id;
+
+-- 组合条件
+SELECT id, content
+FROM t
+WHERE SEARCH('content:EXACT(machine learning) OR content:ANY(intelligence)')
+ORDER BY id;
+
+-- 带 VARIANT 列与倒排索引
+CREATE TABLE test_variant_search_subcolumn (
+  id BIGINT,
+  properties VARIANT<PROPERTIES("variant_max_subcolumns_count"="0")>,
+  INDEX idx_properties (properties) USING INVERTED PROPERTIES (
+    "parser" = "unicode",
+    "lower_case" = "true",
+    "support_phrase" = "true"
+  )
+);
+
+-- 单词查询
+SELECT id
+FROM test_variant_search_subcolumn
+WHERE SEARCH('properties.message:alpha')
+ORDER BY id;
+
+-- AND / ALL 查询
+SELECT id
+FROM test_variant_search_subcolumn
+WHERE SEARCH('properties.message:alpha AND properties.message:beta')
+ORDER BY id;
+
+SELECT id
+FROM test_variant_search_subcolumn
+WHERE SEARCH('properties.message:ALL(alpha beta)')
+ORDER BY id;
+
+-- 不同子列 OR 查询
+SELECT id
+FROM test_variant_search_subcolumn
+WHERE SEARCH('properties.message:hello OR properties.category:beta')
+ORDER BY id;
+```
+
+### 当前限制
+
+- `SEARCH()` 暂不支持短语、前缀、通配符、正则查询
+- `SEARCH()` 暂不支持范围/列表查询
+- 不支持的模式可能降级为词项查询
+
+可使用标准操作符或文本检索算子替代：
+
+```sql
+-- 通过算子进行短语检索
+SELECT * FROM t WHERE content MATCH_PHRASE '全文检索';
+
+-- 通过 SQL 进行范围过滤
+SELECT * FROM t WHERE created_at >= '2024-01-01';
+```
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/inverted-index/search-operators.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/inverted-index/search-operators.md
new file mode 100644
index 00000000000..8e3d4924621
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/inverted-index/search-operators.md
@@ -0,0 +1,79 @@
+---
+{
+    "title": "全文检索与查询加速支持",
+    "language": "zh-CN"
+}
+---
+
+## 全文检索算子
+
+### MATCH_ANY
+- 匹配包含任一关键词的行。
+```sql
+SELECT * FROM table_name WHERE content MATCH_ANY 'keyword1 keyword2';
+```
+
+### MATCH_ALL
+- 匹配同时包含所有关键词的行。
+```sql
+SELECT * FROM table_name WHERE content MATCH_ALL 'keyword1 keyword2';
+```
+
+### MATCH_PHRASE
+- 短语匹配，要求词项相邻且顺序一致。
+- 如需索引加速，请在索引属性中开启 `"support_phrase" = "true"`。
+```sql
+SELECT * FROM table_name WHERE content MATCH_PHRASE 'keyword1 keyword2';
+```
+
+### MATCH_PHRASE（带 slop）
+- 允许关键词之间存在最多 `slop` 个词的间隔。
+```sql
+-- 允许 keyword1 与 keyword2 之间最多间隔 3 个词
+SELECT * FROM table_name WHERE content MATCH_PHRASE 'keyword1 keyword2 ~3';
+```
+
+### MATCH_PHRASE（严格顺序）
+- 结合 `+` 与 slop，要求词序固定。
+```sql
+SELECT * FROM table_name WHERE content MATCH_PHRASE 'keyword1 keyword2 ~3+';
+```
+
+### MATCH_PHRASE_PREFIX
+- 短语匹配，最后一个词按前缀匹配。
+- 当只给出一个词时，退化为该词的前缀匹配。
+```sql
+-- 最后一个词前缀匹配
+SELECT * FROM table_name WHERE content MATCH_PHRASE_PREFIX 'keyword1 key';
+
+-- 单词退化为前缀匹配
+SELECT * FROM table_name WHERE content MATCH_PHRASE_PREFIX 'keyword1';
+```
+
+### MATCH_REGEXP
+- 正则匹配（针对分词后的词项进行匹配）。
+```sql
+SELECT * FROM table_name WHERE content MATCH_REGEXP '^key_word.*';
+```
+
+### MATCH_PHRASE_EDGE
+- 边缘短语匹配：首词按后缀匹配，中间词精确匹配，末词按前缀匹配；词项需相邻。
+```sql
+SELECT * FROM table_name WHERE content MATCH_PHRASE_EDGE 'search engine optim';
+```
+
+## 倒排索引查询加速
+
+### 支持的运算符和函数
+
+- 等值与集合：`=`, `!=`, `IN`, `NOT IN`
+- 范围：`>`, `>=`, `<`, `<=`, `BETWEEN`
+- 空值判断：`IS NULL`, `IS NOT NULL`
+- 数组：`array_contains`, `array_overlaps`
+
+```sql
+-- 示例
+SELECT * FROM t WHERE price >= 100 AND price < 200;            -- 范围
+SELECT * FROM t WHERE tags IN ('a','b','c');                    -- 集合
+SELECT * FROM t WHERE array_contains(attributes, 'color');      -- 数组
+```
diff --git a/sidebars.json b/sidebars.json
index cbcefdd4a20..3b77afb7e7f 100644
--- a/sidebars.json
+++ b/sidebars.json
@@ -126,7 +126,17 @@
                             "items": [
                                 "table-design/index/index-overview",
                                 "table-design/index/prefix-index",
-                                "table-design/index/inverted-index",
+                                {
+                                    "type": "category",
+                                    "label": "Inverted Index",
+                                    "items": [
+                                        
"table-design/index/inverted-index/overview",
+                                        
"table-design/index/inverted-index/search-operators",
+                                        
"table-design/index/inverted-index/search-function",
+                                        
"table-design/index/inverted-index/custom-analyzer",
+                                        
"table-design/index/inverted-index/scoring"
+                                    ]
+                                },
                                 "table-design/index/bloomfilter",
                                 "table-design/index/ngram-bloomfilter-index"
                             ]
@@ -394,7 +404,14 @@
                     "label": "AI",
                     "items": [
                         "ai/ai-function-overview",
-                        "ai/vector-search"
+                        {
+                            "type": "category",
+                            "label": "Search",
+                            "items": [
+                                "ai/search/vector-search",
+                                "ai/search/text-search"
+                            ]
+                        }
                     ]
                 },
                 {
diff --git a/static/images/analyzer.png b/static/images/analyzer.png
new file mode 100644
index 00000000000..06e739fd463
Binary files /dev/null and b/static/images/analyzer.png differ


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris-website) branch master updated: [update](inverted index) update inverted index docs structure and add new content (#2948)

Reply via email to