(doris-website) branch master updated: [doc]update en docs of data index (#478)

luzhijing Thu, 28 Mar 2024 04:42:50 -0700

This is an automated email from the ASF dual-hosted git repository.

luzhijing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 0b297793a037 [doc]update en docs of data index (#478)
0b297793a037 is described below

commit 0b297793a037ad09012088e4f73f9612236e6336
Author: Luzhijing <[email protected]>
AuthorDate: Thu Mar 28 19:42:41 2024 +0800

    [doc]update en docs of data index (#478)
---
 .../version-2.0/table-design/index/bitmap-index.md |  4 +-
 .../version-2.0/table-design/index/bloomfilter.md  | 58 +++++++++----------
 .../table-design/index/index-overview.md           | 66 +++-------------------
 .../table-design/index/inverted-index.md           | 29 +++++-----
 .../table-design/index/ngram-bloomfilter-index.md  | 33 ++++++-----
 .../version-2.0/table-design/index/prefix-index.md | 21 +++++--
 6 files changed, 86 insertions(+), 125 deletions(-)

diff --git a/versioned_docs/version-2.0/table-design/index/bitmap-index.md 
b/versioned_docs/version-2.0/table-design/index/bitmap-index.md
index 97385c429d2d..1a15bd575c35 100644
--- a/versioned_docs/version-2.0/table-design/index/bitmap-index.md
+++ b/versioned_docs/version-2.0/table-design/index/bitmap-index.md
@@ -24,13 +24,11 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Bitmap index
-
 Bitmap Index is an index represented by bitmaps, where a bitmap is created for 
each key value in a column. Compared to other indexes, it occupies very little 
storage space and is very fast to create and use. However, it has a 
disadvantage of having a large lock granularity for modification operations, 
making it unsuitable for frequent updates.
 
 ![bitmap index](/images/Bitmap-index.png)
 
-### Applicable scenarios
+## Applicable scenarios
 
 - Columns with high value repetition, recommended to be between 100 and 
100,000, such as occupation, city, etc. If the repetition is too high, there is 
no significant advantage compared to other types of indexes; if the repetition 
is too low, space efficiency and performance will be greatly reduced.
 - Specific types of queries such as `count`, `or`, `and` logical operations 
that only require bitwise operations. For example: querying with multiple 
conditions combined, `select count(*) from table where city = 'Nanjing' and job 
= 'Doctor' and phonetype = 'iphone' and gender = 'Male'.` If bitmap indexes are 
established on each query condition column, the database can perform efficient 
bit operations, accurately locate the required data, reduce disk IO, and the 
smaller the filtered resul [...]
diff --git a/versioned_docs/version-2.0/table-design/index/bloomfilter.md 
b/versioned_docs/version-2.0/table-design/index/bloomfilter.md
index 83368c79fd4f..0ec7781b34ae 100644
--- a/versioned_docs/version-2.0/table-design/index/bloomfilter.md
+++ b/versioned_docs/version-2.0/table-design/index/bloomfilter.md
@@ -24,7 +24,7 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# BloomFilter Index
+
 
 BloomFilter is a fast search algorithm for multi-hash function mapping 
proposed by Bloom in 1970. Usually used in some occasions where it is necessary 
to quickly determine whether an element belongs to a set, but is not strictly 
required to be 100% correct, BloomFilter has the following characteristics:
 
@@ -36,29 +36,21 @@ Bloom filter is actually composed of an extremely long 
binary bit array and a se
 
 Figure below shows an example of Bloom Filter with m=18, k=3 (m is the size of 
the Bit array, and k is the number of Hash functions). The three elements of x, 
y, and z in the set are hashed into the bit array through three different hash 
functions. When querying the element w, after calculating by the Hash function, 
because one bit is 0, w is not in the set.
 
-![Bloom_filter.svg](/images/Bloom_filter.svg.png)
-
-So how to judge whether the plot and the elements are in the set? Similarly, 
all the offset positions of this element are obtained after hash function 
calculation. If these positions are all 1, then it is judged that this element 
is in this set, if one is not 1, then it is judged that this element is not in 
this set. It's that simple!
-
-## Doris BloomFilter Index and Usage Scenarios
-
-When we use HBase, we know that the Hbase data block index provides an 
effective method to find the data block of the HFile that should be read when 
accessing a specific row. But its utility is limited. The default size of the 
HFile data block is 64KB, and this size cannot be adjusted too much.
-
-If you are looking for a short row, only building an index on the starting row 
key of the entire data block cannot give you fine-grained index information. 
For example, if your row occupies 100 bytes of storage space, a 64KB data block 
contains (64 * 1024)/100 = 655.53 = ~700 rows, and you can only put the 
starting row on the index bit. The row you are looking for may fall in the row 
interval on a particular data block, but it is not necessarily stored in that 
data block. There are many  [...]
+Similarly, if an element undergoes a hash function calculation to obtain all 
its offset positions, and if all these positions are set to 1, then it is 
determined that this element is in the set.
 
-So HBase provides Bloom filters that allow you to do a reverse test on the 
data stored in each data block. When a row is requested, first check the Bloom 
filter to see if the row is not in this data block. The Bloom filter must 
either confirm that the line is not in the answer, or answer that it doesn't 
know. This is why we call it a reverse test. Bloom filters can also be applied 
to the cells in the row. Use the same reverse test first when accessing a 
column of identifiers.
+![Bloom_filter.svg](/images/Bloom_filter.svg.png)
 
-Bloom filters are not without cost. Storing this additional index level takes 
up additional space. Bloom filters grow as their index object data grows, so 
row-level bloom filters take up less space than column identifier-level bloom 
filters. When space is not an issue, they can help you squeeze the performance 
potential of the system.
+Bloom Filter is a data structure that allows for a reverse test on the data 
stored in each data block. When a specific row is requested, the Bloom Filter 
first checks if that row is not in the data block. The Bloom Filter either 
confidently states that the row is not present or it gives an uncertain answer. 
This uncertainty is why it's called a reverse test. However, Bloom Filters are 
not without cost. Storing this additional index level consumes extra space, and 
Bloom Filters grow as th [...]
 
-The BloomFilter index of Doris is specified when the table is built, or 
completed by the ALTER operation of the table. Bloom Filter is essentially a 
bitmap structure, used to quickly determine whether a given value is in a set. 
This judgment will produce a small probability of misjudgment. That is, if it 
returns false, it must not be in this set. And if the range is true, it may be 
in this set.
+In Doris, BloomFilter indexes can be specified during table creation or 
through ALTER table operations. Essentially, a Bloom Filter is a bitmap 
structure used to quickly determine if a given value is in a set. This 
determination can result in a small probability of false positives. That is, if 
it returns false, the value is definitely not in the set. But if it returns 
true, the value might be in the set.
 
-The BloomFilter index is also created with Block as the granularity. In each 
Block, the value of the specified column is used as a set to generate a 
BloomFilter index entry, which is used to quickly filter the data that does not 
meet the conditions in the query.
+BloomFilter indexes in Doris are created at the Block level. Within each 
Block, the values of specified columns are treated as a set to generate a 
BloomFilter index entry. This entry is used during queries to quickly filter 
out data that doesn't meet the conditions.
 
-Let's take a look at how Doris creates BloomFilter indexes through examples.
+Now let's see how Doris creates a BloomFilter index through an example.
 
-### Create BloomFilter Index
+### Create BloomFilter index
 
-The Doris BloomFilter index is created by adding 
"bloom_filter_columns"="k1,k2,k3" to the PROPERTIES of the table building 
statement, this attribute, k1,k2,k3 is the Key column name of the BloomFilter 
index you want to create, for example, we Create a BloomFilter index for the 
saler_id and category_id in the table.
+The Doris BloomFilter index is created by adding 
`"bloom_filter_columns"="k1,k2,k3"` to the PROPERTIES of the table building 
statement, this attribute, k1,k2,k3 is the Key column name of the BloomFilter 
index you want to create, for example, we Create a BloomFilter index for the 
saler_id and category_id in the table.
 
 ```sql
 CREATE TABLE IF NOT EXISTS sale_detail_bloom (
@@ -91,7 +83,7 @@ PROPERTIES (
 );
 ```
 
-### View BloomFilter Index
+### View BloomFilter index
 
 Check that the BloomFilter index we built on the table is to use:
 
@@ -99,32 +91,36 @@ Check that the BloomFilter index we built on the table is 
to use:
 SHOW CREATE TABLE <table_name>;
 ```
 
-### Delete BloomFilter Index
+### Delete BloomFilter index
 
-Deleting the index is to remove the index column from the bloom_filter_columns 
attribute:
+Deleting the index is to remove the index column from the 
`bloom_filter_columns attribute`:
 
 ```sql
 ALTER TABLE <db.table_name> SET ("bloom_filter_columns" = "");
 ```
 
-### Modify BloomFilter Index
+### Modify BloomFilter index
 
-Modifying the index is to modify the bloom_filter_columns attribute of the 
table:
+Modifying the index is to modify the `bloom_filter_columns` attribute of the 
table:
 
 ```sql
 ALTER TABLE <db.table_name> SET ("bloom_filter_columns" = "k1,k3");
 ```
 
-### **Doris BloomFilter Usage Scenarios**
+### **BloomFilter usage scenarios**
+
+User can create a BloomFilter index for a column when the following conditions 
are met:
+
+- BloomFilter is suitable for non-prefix filtering.
+
+- The query will be filtered according to the high frequency of the column, 
and most of the query conditions are `in` and `= `filtering.
+
+- BloomFilter is suitable for high cardinality columns. Such as UserID. 
Because if it is created on a low-cardinality column, such as a `gender` 
column, each Block will almost contain all values, causing the BloomFilter 
index to lose its meaning.
 
-You can consider establishing a Bloom Filter index for a column when the 
following conditions are met:
+### **BloomFilter use precautions**
 
-1. First, BloomFilter is suitable for non-prefix filtering.
-2. The query will be filtered according to the high frequency of the column, 
and most of the query conditions are in and = filtering.
-3. BloomFilter is suitable for high cardinality columns. Such as UserID. 
Because if it is created on a low-cardinality column, such as a "gender" 
column, each Block will almost contain all values, causing the BloomFilter 
index to lose its meaning.
+- It does not support the creation of BloomFilter indexes for 
`Tinyint`,`Float`, and `Double` columns.
 
-### **Doris BloomFilter Use Precautions**
+- The BloomFilter index only has an acceleration effect on `in` and `=` 
filtering queries.
 
-1. It does not support the creation of Bloom Filter indexes for Tinyint, 
Float, and Double columns.
-2. The Bloom Filter index only has an acceleration effect on in and = 
filtering queries.
-3. If you want to check whether a query hits the Bloom Filter index, you can 
check the profile information of the query.
+- If you want to check whether a query hits the BloomFilter index, you can 
check the profile information of the query.
diff --git a/versioned_docs/version-2.0/table-design/index/index-overview.md 
b/versioned_docs/version-2.0/table-design/index/index-overview.md
index 0e56e11307a3..4b467431ebc2 100644
--- a/versioned_docs/version-2.0/table-design/index/index-overview.md
+++ b/versioned_docs/version-2.0/table-design/index/index-overview.md
@@ -24,64 +24,16 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Index Overview
+Indexes are used to help quickly filter or search for data. Currently, Doris 
supports two types of indexes: built-in smart indexes and user-created 
secondary indexes.
 
-Indexes are used to help quickly filter or find data.
+### Built-in smart indexes
 
-Doris currently supports two main types of indexes:
-1. built-in smart indexes, including prefix indexes and ZoneMap indexes.
-2. User-created secondary indexes, including the [inverted 
index](./inverted-index.md), [bloomfilter index](./bloomfilter.md) and [ngram 
bloomfilter index](./ngram-bloomfilter-index.md).
+- Sorted keys and prefix index: Apache Doris stores data in an ordered manner 
based on sorted keys. It creates a prefix index for every 1024 rows of data. 
The key in the index is the value of the sorted column in the first row of the 
current 1024-row group. If a query involves a sorted column, the system will 
find the first row of the relevant 1024-row group and start scanning from there.
+- ZoneMap index: The ZoneMap index is automatically maintained index 
information for each column in the column-based storage format, including 
Min/Max values, etc. This type of index is transparent to users. These are 
segment and page-level indexes, the maximum and minimum values for each column 
within a page are recorded, and the maximum and minimum values for each column 
within a segment are also recorded. Therefore, in equivalence and range 
queries,  this Min/Max index can be used to  [...]
 
-The ZoneMap index is the index information automatically maintained for each 
column in the column storage format, including Min/Max, the number of Null 
values, and so on. This index is transparent to the user.
+### User-created secondary indexes
 
-## Prefix Index
-
-Unlike traditional database designs, Doris does not support creating indexes 
on arbitrary columns. an OLAP database with an MPP architecture like Doris is 
typically designed to handle large amounts of data by increasing concurrency.
-
-Essentially, Doris data is stored in a data structure similar to an SSTable 
(Sorted String Table). This structure is an ordered data structure that can be 
stored sorted by specified columns. On this data structure, it will be very 
efficient to perform lookups with sorted columns as a condition.
-
-In the Aggregate, Unique and Duplicate data models. The underlying data 
storage is sorted and stored according to the columns specified in the 
respective table building statements, AGGREGATE KEY, UNIQUE KEY and DUPLICATE 
KEY.
-
-The prefix index, which is based on sorting, is an indexing method to query 
data quickly based on a given prefix column.
-
-## Examples
-
-We use the first 36 bytes of a row of data as the prefix index of this row of 
data. Prefix indexes are simply truncated when a VARCHAR type is encountered. 
We give an example:
-
-1. The prefix index of the following table structure is user_id(8 Bytes) + 
age(4 Bytes) + message(prefix 20 Bytes).
-
-   | ColumnName     | Type         |
-   | -------------- | ------------ |
-   | user_id        | BIGINT       |
-   | age            | INT          |
-   | message        | VARCHAR(100) |
-   | max_dwell_time | DATETIME     |
-   | min_dwell_time | DATETIME     |
-
-2. The prefix index of the following table structure is user_name(20 Bytes). 
Even if it does not reach 36 bytes, because VARCHAR is encountered, it is 
directly truncated and will not continue further.
-
-   | ColumnName     | Type         |
-   | -------------- | ------------ |
-   | user_name      | VARCHAR(20)  |
-   | age            | INT          |
-   | message        | VARCHAR(100) |
-   | max_dwell_time | DATETIME     |
-   | min_dwell_time | DATETIME     |
-
-When our query condition is the prefix of the prefix index, the query speed 
can be greatly accelerated. For example, in the first example, we execute the 
following query:
-
-```sql
-SELECT * FROM table WHERE user_id=1829239 and age=20；
-```
-
-This query will be much more efficient than the following query:
-
-```sql
-SELECT * FROM table WHERE age=20；
-```
-
-Therefore, when building a table, choosing the correct column order can 
greatly improve query efficiency.
-
-## Adjust Prefix Index by ROLLUP
-
-Because the column order has been specified when the table is created, there 
is only one prefix index for a table. This may not be efficient for queries 
that use other columns that cannot hit the prefix index as conditions. 
Therefore, we can artificially adjust the column order by creating a ROLLUP. 
For details, please refer to [ROLLUP](../hit-the-rollup.md).
+- Bloom Filter Index
+- N-Gram Bloom Filter Index
+- Bitmap Index
+- Inverted Index
\ No newline at end of file
diff --git a/versioned_docs/version-2.0/table-design/index/inverted-index.md 
b/versioned_docs/version-2.0/table-design/index/inverted-index.md
index 9f3aa8a5ba4e..f6a121181039 100644
--- a/versioned_docs/version-2.0/table-design/index/inverted-index.md
+++ b/versioned_docs/version-2.0/table-design/index/inverted-index.md
@@ -24,21 +24,17 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Inverted Index
 
-<version since="2.0.0">
- 
-</version>
 
-From version 2.0.0, Doris implemented inverted index to support fulltext 
search on text field, normal eq and range filter on text, numeric, datetime 
field. This doc introduce inverted index usage, including create, drop and 
query.
+From version 2.0, Doris implemented inverted index to support fulltext search 
on text field, normal eq and range filter on text, numeric, datetime field. 
This doc introduce inverted index usage, including create, drop and query.
 
 
 ## Glossary
 
-- [inverted index](https://en.wikipedia.org/wiki/Inverted_index) is a index 
techlogy used in information retirval commonly. It split text into word terms 
and construct a term to doc index. This index is called inverted index and can 
be used to find the docs where a specific term appears.
+[inverted index](https://en.wikipedia.org/wiki/Inverted_index) is a index 
techlogy used in information retirval commonly. It split text into word terms 
and construct a term to doc index. This index is called inverted index and can 
be used to find the docs where a specific term appears.
 
 
-## Basic Principles
+## Basic principles
 
 In the inverted index of Doris, a row in a table corresponds to a doc in 
CLucene, a column corresponds to a field in doc. So using inverted index, doris 
can get the rows that meet the filter of SQL WHERE clause, and then get the 
rows quickly without reading other unrelated rows.
 
@@ -122,7 +118,8 @@ Inverted indexes have different limitations in different 
data models:
 
 - add an inverted index to existed table
 
-**Before version 2.0-beta:**
+**Before version 2.0.0:**
+
 ```sql
 -- syntax 1
 CREATE INDEX idx_name ON table_name(column_name) USING INVERTED 
[PROPERTIES("parser" = "english|chinese|unicode")] [COMMENT 'your comment'];
@@ -130,7 +127,7 @@ CREATE INDEX idx_name ON table_name(column_name) USING 
INVERTED [PROPERTIES("par
 ALTER TABLE table_name ADD INDEX idx_name(column_name) USING INVERTED 
[PROPERTIES("parser" = "english|chinese|unicode")] [COMMENT 'your comment'];
 ```
 
-**After version 2.0-beta (including 2.0-beta):**
+**After version 2.0.0 (including 2.0.0):**
 
 The above 'create/add index' operation only generates inverted index for 
incremental data. The syntax of BUILD INDEX is added to add inverted index to 
stock data:
 ```sql
@@ -273,7 +270,7 @@ PROPERTIES ("replication_num" = "1");
 ```
 
 
-### Load Data
+### Load data
 
 - load data by stream load
 
@@ -402,7 +399,7 @@ mysql> SELECT count() FROM hackernews_1m WHERE comment 
MATCH_ANY 'OLAP OLTP';
 ```
 
 
-#### Normal Equal, Range Query
+#### Normal equal, range query
 
 - range query on DateTime column
 ```sql
@@ -422,7 +419,8 @@ mysql> SELECT count() FROM hackernews_1m WHERE timestamp > 
'2007-08-23 04:17:00'
 mysql> CREATE INDEX idx_timestamp ON hackernews_1m(timestamp) USING INVERTED;
 Query OK, 0 rows affected (0.03 sec)
 ```
-**After 2.0-beta (including 2.0-beta), you need to execute `BUILD INDEX` to 
add inverted index to the stock data:**
+**After version 2.0.0 (including 2.0.0), you need to execute `BUILD INDEX` to 
add inverted index to the stock data:**
+
 ```sql
 mysql> BUILD INDEX idx_timestamp ON hackernews_1m;
 Query OK, 0 rows affected (0.01 sec)
@@ -439,7 +437,8 @@ mysql> SHOW ALTER TABLE COLUMN;
 1 row in set (0.00 sec)
 ```
 
-**After 2.0-beta (including 2.0-beta), you can view the progress of stock data 
creating index by `SHOW BUILD INDEX`:**
+**After version 2.0.0 (including 2.0.0), you can view the progress of stock 
data creating index by `SHOW BUILD INDEX`:**
+
 ```sql
 -- If the table has no partitions, the PartitionName defaults to TableName
 mysql> SHOW BUILD INDEX;
@@ -477,7 +476,7 @@ mysql> SELECT count() FROM hackernews_1m WHERE parent = 
11189;
 mysql> ALTER TABLE hackernews_1m ADD INDEX idx_parent(parent) USING INVERTED;
 Query OK, 0 rows affected (0.01 sec)
 
--- After 2.0-beta (including 2.0-beta), you need to execute `BUILD INDEX` to 
add inverted index to the stock data:
+-- After version 2.0.0 (including 2.0.0), you need to execute `BUILD INDEX` to 
add inverted index to the stock data:
 mysql> BUILD INDEX idx_parent ON hackernews_1m;
 Query OK, 0 rows affected (0.01 sec)
 
@@ -520,7 +519,7 @@ mysql> SELECT count() FROM hackernews_1m WHERE author = 
'faster';
 mysql> ALTER TABLE hackernews_1m ADD INDEX idx_author(author) USING INVERTED;
 Query OK, 0 rows affected (0.01 sec)
 
--- After 2.0-beta (including 2.0-beta), you need to execute `BUILD INDEX` to 
add inverted index to the stock data:
+-- After version 2.0.0 (including 2.0.0), you need to execute `BUILD INDEX` to 
add inverted index to the stock data:
 mysql> BUILD INDEX idx_author ON hackernews_1m;
 Query OK, 0 rows affected (0.01 sec)
 
diff --git 
a/versioned_docs/version-2.0/table-design/index/ngram-bloomfilter-index.md 
b/versioned_docs/version-2.0/table-design/index/ngram-bloomfilter-index.md
index 4242431f7d88..15e04c75b90a 100644
--- a/versioned_docs/version-2.0/table-design/index/ngram-bloomfilter-index.md
+++ b/versioned_docs/version-2.0/table-design/index/ngram-bloomfilter-index.md
@@ -1,6 +1,6 @@
 ---
 {
-    "title": "NGram BloomFilter Index",
+    "title": "N-Gram BloomFilter Index",
     "language": "en"
 }
 ---
@@ -24,14 +24,17 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# NGram BloomFilter Index
 
-<version since="2.0.0">
-</version>
 
-In order to improve the like query performance, the NGram BloomFilter index 
was implemented.
+In order to improve the like query performance, the N-Gram BloomFilter index 
was implemented.
 
-## Create Column with NGram BloomFilter Index
+:::tip
+
+N-Gram BloomFilter indexing is supported starting from the Doris 2.0 version.
+
+:::
+
+### Create N-Gram BloomFilter index
 
 During create table：
 
@@ -47,34 +50,34 @@ DISTRIBUTED BY HASH(`siteid`) BUCKETS 10
 PROPERTIES (
 "replication_num" = "1"
 );
-
--- PROPERTIES("gram_size"="3", "bf_size"="1024")，indicate the number of gram 
and bytes of bloom filter respectively.
--- the gram size set to same as the like query pattern string length. and the 
suitable bytes of bloom filter can be get by test, more larger more better, 256 
maybe is a good start.
--- Usually, if the data's cardinality is small, you can increase the bytes of 
bloom filter to improve the efficiency.
 ```
 
-## Show NGram BloomFilter Index
+- PROPERTIES("gram_size"="3", "bf_size"="1024")，indicate the number of gram 
and bytes of bloom filter respectively.
+- The number of grams is related to the actual query scenarios and is usually 
set to the length of most query strings. The byte size of the Bloom Filter can 
be determined through testing, and generally, a larger size leads to better 
filtering effects. You can start with 256 bytes for validation testing to see 
the results. However, a larger byte size will also increase the storage and 
memory cost of the index.
+- If the data cardinality is relatively high, the byte size does not need to 
be set too large. If the cardinality is not very high, the filtering effect can 
be improved by increasing the byte size.
+
+### Show N-Gram BloomFilter index
 
 ```sql
 show index from example_db.table3;
 ```
 
-## Drop NGram BloomFilter Index
+### Drop N-Gram BloomFilter index
 
 
 ```sql
 alter table example_db.table3 drop index idx_ngrambf;
 ```
 
-## Add NGram BloomFilter Index
+### Modify N-Gram BloomFilter index
 
-Add NGram BloomFilter Index for old column:
+Add N-Gram BloomFilter Index for old column:
 
 ```sql
 alter table example_db.table3 add index idx_ngrambf(username) using NGRAM_BF 
PROPERTIES("gram_size"="3", "bf_size"="512")comment 'username ngram_bf index' 
 ```
 
-## **Some Notes about Doris NGram BloomFilter**
+### **Some Notes about Doris NGram BloomFilter**
 
 1. NGram BloomFilter only support CHAR/VARCHAR/String column.
 2. NGram BloomFilter index and BloomFilter index should be exclusive on same 
column
diff --git a/versioned_docs/version-2.0/table-design/index/prefix-index.md 
b/versioned_docs/version-2.0/table-design/index/prefix-index.md
index 2eb2b943538b..e0b1157404da 100644
--- a/versioned_docs/version-2.0/table-design/index/prefix-index.md
+++ b/versioned_docs/version-2.0/table-design/index/prefix-index.md
@@ -24,7 +24,6 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## Sort key and prefix index
 
 Doris stores data in a data structure similar to SSTable (Sorted String 
Table), which is an ordered data structure that can sort and store data 
according to specified columns. Performing queries based on sorted columns in 
this data structure is highly efficient.
 
@@ -33,18 +32,32 @@ In the three data models of Aggregate, Unique, and 
Duplicate, the underlying dat
 Based on the Sort Keys, Prefix Indexes are introduced. A Prefix Index is a 
sparse index. In the table, a logical data block is formed according to the 
corresponding number of rows. Each logical data block stores an index entry in 
the Prefix Index table. The length of the index entry does not exceed 36 bytes, 
and its content is the prefix composed of the sorted columns of the first row 
of data in the data block. When searching the Prefix Index table, it can help 
determine the starting row [...]
 
 :::tip
-The first 36 bytes of a row of data in a data block serve as the prefix index 
for that row. When encountering a VARCHAR type, the prefix index will be 
truncated directly. If the first column is of the VARCHAR type, truncation will 
occur even if the length does not reach 36 bytes.
+
+The first 36 bytes of a row of data in a data block serve as the prefix index 
for that row. When encountering a `VARCHAR` type, the prefix index will be 
truncated directly. If the first column is of the `VARCHAR` type, truncation 
will occur even if the length does not reach 36 bytes.
+
 :::
 
 ## Example
 
 - If the sort keys of the table are as follows: 5 columns, then the prefix 
index would be: user_id (8 Bytes), age (4 Bytes), message (prefix 20 Bytes).
 
-ColumnNameTypeuser_idBIGINTageINTmessageVARCHAR(100)max_dwell_timeDATETIMEmin_dwell_timeDATETIME
+| ColumnName     | Type         |
+| -------------- | ------------ |
+| user_id        | BIGINT       |
+| age            | INT          |
+| message        | VARCHAR(100) |
+| max_dwell_time | DATETIME     |
+| min_dwell_time | DATETIME     |
 
 - If the sort keys of the table consist of 5 columns and the first column is 
`user_name` of the VARCHAR type, then the prefix index would be `user_name` 
(truncated to 20 Bytes). Even though the total size of the prefix index has not 
reached 36 bytes, truncation occurs because it encounters a VARCHAR column, and 
no further columns are included.
 
-ColumnNameTypeuser_nameVARCHAR(20)ageINTmessageVARCHAR(100)max_dwell_timeDATETIMEmin_dwell_timeDATETIME
+| ColumnName     | Type         |
+| -------------- | ------------ |
+| user_name      | VARCHAR(20)  |
+| age            | INT          |
+| message        | VARCHAR(100) |
+| max_dwell_time | DATETIME     |
+| min_dwell_time | DATETIME     |
 
 - When our query conditions match the prefix index, it can greatly accelerate 
the query speed. For example, in the first case, executing the following query:
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris-website) branch master updated: [doc]update en docs of data index (#478)

Reply via email to