This is an automated email from the ASF dual-hosted git repository.
kxiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push:
new 1f769291b5 [doc](invert index) add invert index char_filter doc
(#24205)
1f769291b5 is described below
commit 1f769291b54a5150341f2e0978b25b37871ff760
Author: zzzxl <[email protected]>
AuthorDate: Wed Sep 13 10:02:45 2023 +0800
[doc](invert index) add invert index char_filter doc (#24205)
---
docs/en/docs/data-table/index/inverted-index.md | 7 +++++++
docs/zh-CN/docs/data-table/index/inverted-index.md | 7 +++++++
2 files changed, 14 insertions(+)
diff --git a/docs/en/docs/data-table/index/inverted-index.md
b/docs/en/docs/data-table/index/inverted-index.md
index 1e17ca011b..f86d47c8bb 100644
--- a/docs/en/docs/data-table/index/inverted-index.md
+++ b/docs/en/docs/data-table/index/inverted-index.md
@@ -84,6 +84,11 @@ The features for inverted index is as follows:
- "true" indicates that support is needed, but needs more storage for
index.
- "false" indicates that support is not needed, and less storage for
index. MATCH_ALL can be used for matching multi words without order.
- default mode is "false".
+ - char_filter: the main function is to pre-process the string before word
segmentation
+ - char_filter_type: specify char_filters with different functions
(currently only char_replace is supported)
+ - char_replace: replace each char in the pattern with a char in the
replacement
+ - char_filter_pattern: character array to be replaced
+ - char_filter_replacement: replaced character array, can be left
unset, defaults to a space character
- COMMENT is optional
```sql
@@ -94,6 +99,8 @@ CREATE TABLE table_name
INDEX idx_name2(column_name2) USING INVERTED [PROPERTIES("parser" =
"english|chinese|unicode")] [COMMENT 'your comment']
INDEX idx_name3(column_name3) USING INVERTED [PROPERTIES("parser" =
"chinese", "parser_mode" = "fine_grained|coarse_grained")] [COMMENT 'your
comment']
INDEX idx_name4(column_name4) USING INVERTED [PROPERTIES("parser" =
"english|chinese|unicode", "support_phrase" = "true|false")] [COMMENT 'your
comment']
+ INDEX idx_name5(column_name4) USING INVERTED [PROPERTIES("char_filter_type"
= "char_replace", "char_filter_pattern" = "._"), "char_filter_replacement" = "
"] [COMMENT 'your comment']
+ INDEX idx_name5(column_name4) USING INVERTED [PROPERTIES("char_filter_type"
= "char_replace", "char_filter_pattern" = "._")] [COMMENT 'your comment']
)
table_properties;
```
diff --git a/docs/zh-CN/docs/data-table/index/inverted-index.md
b/docs/zh-CN/docs/data-table/index/inverted-index.md
index ce85973752..ad4c9a011d 100644
--- a/docs/zh-CN/docs/data-table/index/inverted-index.md
+++ b/docs/zh-CN/docs/data-table/index/inverted-index.md
@@ -82,6 +82,11 @@ Doris倒排索引的功能简要介绍如下:
- true为支持,但是索引需要更多的存储空间
- false为不支持,更省存储空间,可以用MATCH_ALL查询多个关键字
- 默认false
+ - char_filter:功能主要在分词前对字符串提前处理
+ - char_filter_type:指定使用不同功能的char_filter(目前仅支持char_replace)
+ - char_replace 将pattern中每个char替换为一个replacement中的char
+ - char_filter_pattern:需要被替换掉的字符数组
+ - char_filter_replacement:替换后的字符数组,可以不用配置,默认为一个空格字符
- COMMENT 是可选的,用于指定注释
```sql
@@ -92,6 +97,8 @@ CREATE TABLE table_name
INDEX idx_name2(column_name2) USING INVERTED [PROPERTIES("parser" =
"english|unicode|chinese")] [COMMENT 'your comment']
INDEX idx_name3(column_name3) USING INVERTED [PROPERTIES("parser" =
"chinese", "parser_mode" = "fine_grained|coarse_grained")] [COMMENT 'your
comment']
INDEX idx_name4(column_name4) USING INVERTED [PROPERTIES("parser" =
"english|unicode|chinese", "support_phrase" = "true|false")] [COMMENT 'your
comment']
+ INDEX idx_name5(column_name4) USING INVERTED [PROPERTIES("char_filter_type"
= "char_replace", "char_filter_pattern" = "._"), "char_filter_replacement" = "
"] [COMMENT 'your comment']
+ INDEX idx_name5(column_name4) USING INVERTED [PROPERTIES("char_filter_type"
= "char_replace", "char_filter_pattern" = "._")] [COMMENT 'your comment']
)
table_properties;
```
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]