(doris-website) branch master updated: Update pinyin tokenizer documentation: add version support info (#3091)

airborne Fri, 14 Nov 2025 17:32:49 -0800

This is an automated email from the ASF dual-hosted git repository.

airborne pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 4e7de5f4141 Update pinyin tokenizer documentation: add version support 
info (#3091)
4e7de5f4141 is described below

commit 4e7de5f41413c52b4cddab0db84d7393360efd1b
Author: zzzxl <[email protected]>
AuthorDate: Sat Nov 15 09:31:06 2025 +0800

    Update pinyin tokenizer documentation: add version support info (#3091)
    
    ## Versions
    
    - [x] dev
    - [x] 4.x
    - [ ] 3.x
    - [ ] 2.1
    
    ## Languages
    
    - [x] Chinese
    - [x] English
    
    ## Docs Checklist
    
    - [ ] Checked by AI
    - [ ] Test Cases Built
---
 docs/ai/text-search/custom-analyzer.md                                  | 2 +-
 .../version-4.x/ai/text-search/custom-analyzer.md                       | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/ai/text-search/custom-analyzer.md 
b/docs/ai/text-search/custom-analyzer.md
index 9351d7d11c9..f5fd87820fb 100644
--- a/docs/ai/text-search/custom-analyzer.md
+++ b/docs/ai/text-search/custom-analyzer.md
@@ -47,7 +47,7 @@ Available tokenizers:
 - **char_group**: Tokenizes on specified characters
 - **basic**: Simple English, numbers, Chinese, Unicode tokenizer
 - **icu**: International text segmentation supporting all languages
-- **pinyin**: Chinese pinyin conversion tokenizer for Chinese text search
+- **pinyin**: Chinese pinyin conversion tokenizer for Chinese text search 
(Supported from 4.0.2, phrase queries not supported yet)
   - `keep_first_letter`: When enabled, retains only the first letter of each 
Chinese character. For example, `刘德华` becomes `ldh`. Default: true
   - `keep_separate_first_letter`: When enabled, keeps the first letters of 
each Chinese character separately. For example, `刘德华` becomes `l`,`d`,`h`. 
Default: false. Note: This may increase query fuzziness due to term frequency
   - `limit_first_letter_length`: Sets the maximum length of the first letter 
result. Default: 16
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ai/text-search/custom-analyzer.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ai/text-search/custom-analyzer.md
index f46a86db8ec..529fb993fde 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ai/text-search/custom-analyzer.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ai/text-search/custom-analyzer.md
@@ -54,7 +54,7 @@ PROPERTIES (
 - `basic`：简单英文/数字/中文/Unicode 分词
   - `extra_chars`：额外分割的 ASCII 字符（如 `[]().`）
 - `icu`：ICU 国际化分词，支持多语言复杂脚本
-- `pinyin`：拼音分词器，用于中文拼音搜索
+- `pinyin`：拼音分词器，用于中文拼音搜索（4.0.2开始支持，暂不支持短语查询）
   - `keep_first_letter`：启用时，仅保留每个汉字的首字母。例如，`刘德华` 变为 `ldh`。默认值：true
   - `keep_separate_first_letter`：启用时，将每个汉字的首字母分别保留。例如，`刘德华` 变为 
`l`,`d`,`h`。默认值：false。注意：由于词频的原因，这可能会增加查询的模糊性
   - `limit_first_letter_length`：设置首字母结果的最大长度。默认值：16


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris-website) branch master updated: Update pinyin tokenizer documentation: add version support info (#3091)

Reply via email to