[doris-website] branch master updated: update two blogs (#264)

luzhijing Mon, 17 Jul 2023 01:03:34 -0700

This is an automated email from the ASF dual-hosted git repository.

luzhijing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new d368a775b5d update two blogs (#264)
d368a775b5d is described below

commit d368a775b5d8c4f701e282e23bcfc77db6ca238a
Author: Hu Yanjun <[email protected]>
AuthorDate: Mon Jul 17 16:01:27 2023 +0800

    update two blogs (#264)
---
 blog/Moka.md                    |  14 +++++++-------
 blog/Tianyancha.md              |  22 +++++++++++-----------
 static/images/Moka_1.png        | Bin 0 -> 163117 bytes
 static/images/Moka_2.png        | Bin 0 -> 101298 bytes
 static/images/Moka_3.png        | Bin 0 -> 125885 bytes
 static/images/Moka_4.png        | Bin 0 -> 71536 bytes
 static/images/Moka_5.png        | Bin 0 -> 34206 bytes
 static/images/Moka_6.png        | Bin 0 -> 107745 bytes
 static/images/Tianyancha_1.png  | Bin 0 -> 175688 bytes
 static/images/Tianyancha_10.png | Bin 0 -> 282789 bytes
 static/images/Tianyancha_11.png | Bin 0 -> 70497 bytes
 static/images/Tianyancha_2.png  | Bin 0 -> 126302 bytes
 static/images/Tianyancha_3.png  | Bin 0 -> 154959 bytes
 static/images/Tianyancha_4.png  | Bin 0 -> 178209 bytes
 static/images/Tianyancha_5.png  | Bin 0 -> 199652 bytes
 static/images/Tianyancha_6.png  | Bin 0 -> 24594 bytes
 static/images/Tianyancha_7.png  | Bin 0 -> 78898 bytes
 static/images/Tianyancha_8.png  | Bin 0 -> 261317 bytes
 static/images/Tianyancha_9.png  | Bin 0 -> 249248 bytes
 19 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/blog/Moka.md b/blog/Moka.md
index 3222a610e8e..08d82ceeea2 100644
--- a/blog/Moka.md
+++ b/blog/Moka.md
@@ -35,13 +35,13 @@ Business intelligence (BI) tool is often the last stop of a 
data processing pipe
 
 I work as an engineer that supports a human resource management system. One 
prominent selling point of our services is **self-service** **BI**. That means 
we allow users to customize their own dashboards: they can choose the fields 
they need and relate them to form the dataset as they want. 
 
-![](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/091e5427a12a42d0a2033f6f77faf21c~tplv-k3u1fbpfcp-zoom-1.image)
+![](../static/images/Moka_1.png)
 
 Join query is a more efficient way to realize self-service BI. It allows 
people to break down their data assets into many smaller tables instead of 
putting it all in a flat table. This would make data updates much faster and 
more cost-effective, because updating the whole flat table is not always the 
optimal choice when you have plenty of new data flowing in and old data being 
updated or deleted frequently, as is the case for most data input.
 
 In order to maximize the time value of data, we need data updates to be 
executed really quickly. For this purpose, we looked into three OLAP databases 
on the market. They are all fast in some way but there are some differences.
 
-![](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/abb0e1ec221c46ad954e46babb825e86~tplv-k3u1fbpfcp-zoom-1.image)
+![](../static/images/Moka_2.png)
 
 Greenplum is really quick in data loading and batch DML processing, but it is 
not good at handling high concurrency. There is a steep decline in performance 
as query concurrency rises. This can be risky for a BI platform that tries to 
ensure stable user experience. ClickHouse is mind-blowing in single-table 
queries, but it only allows batch update and batch delete, so that's less 
timely.
 
@@ -51,7 +51,7 @@ JOIN, my old friend JOIN, is always a hassle. Join queries 
are demanding for bot
 
 We tested our candidate OLAP engines with our common join queries and our most 
notorious slow queries. 
 
-![](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/217cb8180fe54af8a2c3b729ffb4d3d7~tplv-k3u1fbpfcp-zoom-1.image)
+![](../static/images/Moka_3.png)
 
 As the number of tables joined grows, we witness a widening performance gap 
between Apache Doris and ClickHouse. In most join queries, Apache Doris was 
about 5 times faster than ClickHouse. In terms of slow queries, Apache Doris 
responded to most of them within less than 1 second, while the performance of 
ClickHouse fluctuated within a relatively large range. 
 
@@ -79,15 +79,15 @@ Human resource data is subject to very strict and 
fine-grained access control po
 
 How does all this add to complexity in engineering? Any user who inputs a 
query on our BI platform must go through multi-factor authentication, and the 
authenticated information will all be inserted into the SQL via `in` and then 
passed on to the OLAP engine. Therefore, the more fine-grained the privilege 
controls are, the longer the SQL will be, and the more time the OLAP system 
will spend on ID filtering. That's why our users are often tortured by high 
latency.
 
-![](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/980f9b26973d416cb43801dbc536c609~tplv-k3u1fbpfcp-zoom-1.image)
+![](../static/images/Moka_4.png)
 
 So how did we fix that? We use the [Bloom Filter 
index](https://doris.apache.org/docs/dev/data-table/index/bloomfilter/) in 
Apache Doris. 
 
-![](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/e42c1ec24b73405ca25635404d06339d~tplv-k3u1fbpfcp-zoom-1.image)
+![](../static/images/Moka_5.png)
 
 By adding Bloom Filter indexes to the relevant ID fields, we improve the speed 
of privileged queries by 30% and basically eliminate timeout errors.
 
-![](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/717837d74d6f45bc9715f6a8ed116594~tplv-k3u1fbpfcp-zoom-1.image)
+![](../static/images/Moka_6.png)
 
 Tips on when you should use the Bloom Filter index:
 
@@ -101,4 +101,4 @@ We believe self-service BI is the future in the BI 
landscape, just like AGI is t
 
 
 
-Find the Apache Doris developers on [Slack](https://t.co/ZxJuNJHXb2)
\ No newline at end of file
+Find the Apache Doris developers on [Slack](https://t.co/ZxJuNJHXb2)
diff --git a/blog/Tianyancha.md b/blog/Tianyancha.md
index 22e52fd08c5..e2c1678725b 100644
--- a/blog/Tianyancha.md
+++ b/blog/Tianyancha.md
@@ -37,15 +37,15 @@ Our old data warehouse consisted of the most popular 
components of the time, inc
 
 As you can imagine, a long and complicated data pipeline is high-maintenance 
and detrimental to development efficiency. Moreover, they are not capable of 
ad-hoc queries. So as an upgrade to our data warehouse, we replaced most of 
these components with [Apache Doris](https://github.com/apache/doris), a 
unified analytic database.
 
-![](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/6ab944ea7b2642b2ad6b1a997e0d9221~tplv-k3u1fbpfcp-zoom-1.image)
+![](../static/images/Tianyancha_1.png)
 
-![](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/8a8dedb7177443d29928545228e30897~tplv-k3u1fbpfcp-zoom-1.image)
+![](../static/images/Tianyancha_2.png)
 
 ## Data Flow
 
 This is a lateral view of our data warehouse, from which you can see how the 
data flows.
 
-![](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/990e8d4a6b2747b3898b44d43ad48f2b~tplv-k3u1fbpfcp-zoom-1.image)
+![](../static/images/Tianyancha_3.png)
 
 For starters, binlogs from MySQL will be ingested into Kafka via Canal, while 
user activity logs will be transferred to Kafka via Apache Flume. In Kafka, 
data will be cleaned and organized into flat tables, which will be later turned 
into aggregated tables. Then, data will be passed from Kafka to Apache Doris, 
which serves as the storage and computing engine. 
 
@@ -59,7 +59,7 @@ This is how Apache Doris replaces the roles of Hive, 
Elasticsearch, and PostgreS
 
 **After**: Since Apache Doris has all the itemized data, whenever it is faced 
with a new request, it can simply pull the metadata and configure the query 
conditions. Then it is ready for ad-hoc queries. In short, it only requires 
low-code configuration to respond to new requests. 
 
-![](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/28b680bc4bc042509c77e13957ff0c63~tplv-k3u1fbpfcp-zoom-1.image)
+![](../static/images/Tianyancha_4.png)
 
 ## User Segmentation
 
@@ -71,7 +71,7 @@ Tables in Elasticsearch and PostgreSQL were unreusable, 
making this architecture
 
 In this Doris-centered user segmentation process, we don't have to pre-define 
new tags. Instead, tags can be auto-generated based on the task conditions. The 
processing pipeline has the flexibility that can make our user-group-based A/B 
testing easier. Also, as both the itemized data and user group packets are in 
Apache Doris, we don't have to attend to the read and write complexity between 
multiple components.
 
-![](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/dd95672715a6408c895cfce022d98d1e~tplv-k3u1fbpfcp-zoom-1.image)
+![](../static/images/Tianyancha_5.png)
 
 ## Trick to Speed up User Segmentation by 70%
 
@@ -79,9 +79,9 @@ Due to risk aversion reasons, random generation of `user_id` 
is the choice for m
 
 To solve that, we created consecutive and dense mappings for these user IDs. 
**In this way, we decreased our user segmentation latency by 70%.**
 
-![](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/f40840b0b2284e9c804af8f157bb9790~tplv-k3u1fbpfcp-zoom-1.image)
+![](../static/images/Tianyancha_6.png)
 
-![](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/eda0bdc09e0142d3b162b4e6aa037976~tplv-k3u1fbpfcp-zoom-1.image)
+![](../static/images/Tianyancha_7.png)
 
 ### Example
 
@@ -89,13 +89,13 @@ To solve that, we created consecutive and dense mappings 
for these user IDs. **I
 
 We adopt the Unique model for user ID mapping tables, where the user ID is the 
unique key. The mapped consecutive IDs usually start from 1 and are strictly 
increasing. 
 
-![](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/cf2b3bdb112f4ef28aeadf0577e97c7e~tplv-k3u1fbpfcp-zoom-1.image)
+![](../static/images/Tianyancha_8.png)
 
 **Step 2: Create a user group table:**
 
 We adopt the Aggregate model for user group tables, where user tags serve as 
the aggregation keys. 
 
-![](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/5e0da0c1435c4f209fc28e67ab218d81~tplv-k3u1fbpfcp-zoom-1.image)
+![](../static/images/Tianyancha_9.png)
 
 Supposing that we need to pick out the users whose IDs are between 0 and 
2000000. 
 
@@ -104,13 +104,13 @@ The following snippets use non-consecutive 
(`tyc_user_id`) and consecutive (`tyc
 - Non-Consecutive User IDs: **1843ms**
 - Consecutive User IDs: **543ms** 
 
-![](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/ae45a2855e644ba0a040a2d5953b8e49~tplv-k3u1fbpfcp-zoom-1.image)
+![](../static/images/Tianyancha_10.png)
 
 ## Conclusion
 
 We have 2 clusters in Apache Doris accommodating tens of TBs of data, with 
almost a billion new rows flowing in every day. We used to witness a steep 
decline in data ingestion speed as data volume expanded. But after upgrading 
our data warehouse with Apache Doris, we increased our data writing efficiency 
by 75%. Also, in user segmentation with a result set of less than 5 million, it 
is able to respond within milliseconds. Most importantly, our data warehouse 
has been simpler and friendli [...]
 
-![](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/3b11fe822c3d474885d6e872ced7a184~tplv-k3u1fbpfcp-zoom-1.image)
+![](../static/images/Tianyancha_11.png)
 
 Lastly, I would like to share with you something that interested us most when 
we first talked to the [Apache Doris community](https://t.co/KcxAtAJZjZ):
 
diff --git a/static/images/Moka_1.png b/static/images/Moka_1.png
new file mode 100644
index 00000000000..67e086218cc
Binary files /dev/null and b/static/images/Moka_1.png differ
diff --git a/static/images/Moka_2.png b/static/images/Moka_2.png
new file mode 100644
index 00000000000..498fba4191f
Binary files /dev/null and b/static/images/Moka_2.png differ
diff --git a/static/images/Moka_3.png b/static/images/Moka_3.png
new file mode 100644
index 00000000000..02aafccbca8
Binary files /dev/null and b/static/images/Moka_3.png differ
diff --git a/static/images/Moka_4.png b/static/images/Moka_4.png
new file mode 100644
index 00000000000..1341e2a6416
Binary files /dev/null and b/static/images/Moka_4.png differ
diff --git a/static/images/Moka_5.png b/static/images/Moka_5.png
new file mode 100644
index 00000000000..0f4d0dfc8f2
Binary files /dev/null and b/static/images/Moka_5.png differ
diff --git a/static/images/Moka_6.png b/static/images/Moka_6.png
new file mode 100644
index 00000000000..9092c8e6f26
Binary files /dev/null and b/static/images/Moka_6.png differ
diff --git a/static/images/Tianyancha_1.png b/static/images/Tianyancha_1.png
new file mode 100644
index 00000000000..d21f10e51f8
Binary files /dev/null and b/static/images/Tianyancha_1.png differ
diff --git a/static/images/Tianyancha_10.png b/static/images/Tianyancha_10.png
new file mode 100644
index 00000000000..254fb17fe39
Binary files /dev/null and b/static/images/Tianyancha_10.png differ
diff --git a/static/images/Tianyancha_11.png b/static/images/Tianyancha_11.png
new file mode 100644
index 00000000000..1c6f06e0781
Binary files /dev/null and b/static/images/Tianyancha_11.png differ
diff --git a/static/images/Tianyancha_2.png b/static/images/Tianyancha_2.png
new file mode 100644
index 00000000000..ba008d1313f
Binary files /dev/null and b/static/images/Tianyancha_2.png differ
diff --git a/static/images/Tianyancha_3.png b/static/images/Tianyancha_3.png
new file mode 100644
index 00000000000..a2c9fc70f42
Binary files /dev/null and b/static/images/Tianyancha_3.png differ
diff --git a/static/images/Tianyancha_4.png b/static/images/Tianyancha_4.png
new file mode 100644
index 00000000000..87454f465e7
Binary files /dev/null and b/static/images/Tianyancha_4.png differ
diff --git a/static/images/Tianyancha_5.png b/static/images/Tianyancha_5.png
new file mode 100644
index 00000000000..7f076f21c64
Binary files /dev/null and b/static/images/Tianyancha_5.png differ
diff --git a/static/images/Tianyancha_6.png b/static/images/Tianyancha_6.png
new file mode 100644
index 00000000000..a8112823f6d
Binary files /dev/null and b/static/images/Tianyancha_6.png differ
diff --git a/static/images/Tianyancha_7.png b/static/images/Tianyancha_7.png
new file mode 100644
index 00000000000..33187b8aa5c
Binary files /dev/null and b/static/images/Tianyancha_7.png differ
diff --git a/static/images/Tianyancha_8.png b/static/images/Tianyancha_8.png
new file mode 100644
index 00000000000..55e05561950
Binary files /dev/null and b/static/images/Tianyancha_8.png differ
diff --git a/static/images/Tianyancha_9.png b/static/images/Tianyancha_9.png
new file mode 100644
index 00000000000..41aec4df0fd
Binary files /dev/null and b/static/images/Tianyancha_9.png differ


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[doris-website] branch master updated: update two blogs (#264)

Reply via email to