This is an automated email from the ASF dual-hosted git repository.
airborne pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 4e7de5f4141 Update pinyin tokenizer documentation: add version support
info (#3091)
4e7de5f4141 is described below
commit 4e7de5f41413c52b4cddab0db84d7393360efd1b
Author: zzzxl <[email protected]>
AuthorDate: Sat Nov 15 09:31:06 2025 +0800
Update pinyin tokenizer documentation: add version support info (#3091)
## Versions
- [x] dev
- [x] 4.x
- [ ] 3.x
- [ ] 2.1
## Languages
- [x] Chinese
- [x] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---
docs/ai/text-search/custom-analyzer.md | 2 +-
.../version-4.x/ai/text-search/custom-analyzer.md | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/docs/ai/text-search/custom-analyzer.md
b/docs/ai/text-search/custom-analyzer.md
index 9351d7d11c9..f5fd87820fb 100644
--- a/docs/ai/text-search/custom-analyzer.md
+++ b/docs/ai/text-search/custom-analyzer.md
@@ -47,7 +47,7 @@ Available tokenizers:
- **char_group**: Tokenizes on specified characters
- **basic**: Simple English, numbers, Chinese, Unicode tokenizer
- **icu**: International text segmentation supporting all languages
-- **pinyin**: Chinese pinyin conversion tokenizer for Chinese text search
+- **pinyin**: Chinese pinyin conversion tokenizer for Chinese text search
(Supported from 4.0.2, phrase queries not supported yet)
- `keep_first_letter`: When enabled, retains only the first letter of each
Chinese character. For example, `刘德华` becomes `ldh`. Default: true
- `keep_separate_first_letter`: When enabled, keeps the first letters of
each Chinese character separately. For example, `刘德华` becomes `l`,`d`,`h`.
Default: false. Note: This may increase query fuzziness due to term frequency
- `limit_first_letter_length`: Sets the maximum length of the first letter
result. Default: 16
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ai/text-search/custom-analyzer.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ai/text-search/custom-analyzer.md
index f46a86db8ec..529fb993fde 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ai/text-search/custom-analyzer.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ai/text-search/custom-analyzer.md
@@ -54,7 +54,7 @@ PROPERTIES (
- `basic`:简单英文/数字/中文/Unicode 分词
- `extra_chars`:额外分割的 ASCII 字符(如 `[]().`)
- `icu`:ICU 国际化分词,支持多语言复杂脚本
-- `pinyin`:拼音分词器,用于中文拼音搜索
+- `pinyin`:拼音分词器,用于中文拼音搜索(4.0.2开始支持,暂不支持短语查询)
- `keep_first_letter`:启用时,仅保留每个汉字的首字母。例如,`刘德华` 变为 `ldh`。默认值:true
- `keep_separate_first_letter`:启用时,将每个汉字的首字母分别保留。例如,`刘德华` 变为
`l`,`d`,`h`。默认值:false。注意:由于词频的原因,这可能会增加查询的模糊性
- `limit_first_letter_length`:设置首字母结果的最大长度。默认值:16
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]