This is an automated email from the ASF dual-hosted git repository.
luzhijing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new b4ef6a32a72 [Fix] Replace Doris Operator docs link and fix inverted
index blog (#401)
b4ef6a32a72 is described below
commit b4ef6a32a7259112f95642c68137e51bd148248d
Author: KassieZ <[email protected]>
AuthorDate: Fri Feb 2 10:48:14 2024 +0800
[Fix] Replace Doris Operator docs link and fix inverted index blog (#401)
---
...ed-index-accelerates-text-searches-by-40-time-apache-doris.md | 9 ++++++++-
src/pages/ecosystem/cluster-management/index.tsx | 4 +++-
2 files changed, 11 insertions(+), 2 deletions(-)
diff --git
a/blog/inverted-index-accelerates-text-searches-by-40-time-apache-doris.md
b/blog/inverted-index-accelerates-text-searches-by-40-time-apache-doris.md
index 7c001930895..9406fbd8fe4 100644
--- a/blog/inverted-index-accelerates-text-searches-by-40-time-apache-doris.md
+++ b/blog/inverted-index-accelerates-text-searches-by-40-time-apache-doris.md
@@ -211,9 +211,12 @@ By skipping the irrelevant pages, the BloomFilter index
reduces unnecessary data
`gram_size` determines the matching efficiency, while `bf_size` impacts the
false positive rate. Typically, a large `bf_size` reduces the false positive
rate but also requires more storage space. Thus, we suggest that you configure
these two parameters based on these two factors:
1. Text length:
+
- For short texts (words or phrases), a small `gram_size` (2~4) and a small
`bf_size` are recommended.
- For long texts (sentences or paragraphs), a large `gram_size` (5~10) and
a large `bf_size` work better.
+
2. Query pattern:
+
- If the queries often involve phrases or complete words, a large
`gram_size` will be more efficient.
- For fuzzy matching or diverse queries, a small `gram_size` allows more
flexible matching.
@@ -222,6 +225,7 @@ By skipping the irrelevant pages, the BloomFilter index
reduces unnecessary data
[Inverted
index](https://doris.apache.org/blog/Building-A-Log-Analytics-Solution-10-Times-More-Cost-Effective-Than-Elasticsearch)
is another way to accelerate text searches. Creating inverted index is simple:
1. **Add inverted index**: Refer to the snippet below to create inverted
index for the `review_body` column of the `amazon_reviews` table. Inverted
index supports phrase searching, in which the order of the tokenized words will
affect the search results.
+
2. **Add inverted index for historical data**: You can also create inverted
index for historical data.
```SQL
@@ -298,6 +302,8 @@ Results show that inverted index has decreased the query
latency to **0.19 secon
5 rows in set (0.19 sec)
```
+
+
**How does inverted index make it possible?**
Inverted index splits the texts into words and maps each word to a row number.
Then the tokenized words are sorted alphabetically and and a skip list index is
created. When executing queries of specific words, the system locates the row
numbers in this orderly mapping using the skip list index and binary search
methods. Based on the row numbers, the system retrieves the entire data record.
@@ -306,7 +312,8 @@ This approach avoids line-by-line matching and reduces
computational complexity

-*Illustration of Inverted Index*
+
+<div style={{textAlign:'center'}}> Illustration of Inverted Index </div >
To provide a deeper understanding of inverted index, I will start from its
read/write logic. In Doris, logically, inverted index is applied at the column
level of a table. However, from a physical storage and implementation
perspective, it is actually built on data files.
diff --git a/src/pages/ecosystem/cluster-management/index.tsx
b/src/pages/ecosystem/cluster-management/index.tsx
index a0d753ddc1b..35dd0d6b523 100644
--- a/src/pages/ecosystem/cluster-management/index.tsx
+++ b/src/pages/ecosystem/cluster-management/index.tsx
@@ -3,6 +3,7 @@ import EcomsystemLayout from
'@site/src/components/ecomsystem/ecomsystem-layout/
import ExternalLink from '@site/src/components/external-link/external-link';
import CollapseBox from '@site/src/components/collapse-box/collapse-box';
import '../index.scss';
+import { ExternalLinkArrowIcon } from
'@site/src/components/Icons/external-link-arrow-icon';
export default function ClusterManagement() {
return (
@@ -52,9 +53,10 @@ export default function ClusterManagement() {
<>
<ExternalLink
href="https://www.velodb.io/download/tools" label="Download"></ExternalLink>
<ExternalLink
-
href="https://github.com/apache/doris/blob/master/docs/en/docs/install/k8s-deploy/operator-deploy.md"
+
href="https://doris.apache.org/docs/install/k8s-deploy/operator-deploy"
className="sub-btn"
label="Docs"
+ linkIcon={<ExternalLinkArrowIcon />}
></ExternalLink>
</>
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]