[doris-website] branch master updated: update url and pics (#310)

luzhijing Wed, 27 Sep 2023 07:04:39 -0700

This is an automated email from the ASF dual-hosted git repository.

luzhijing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new f918c6a3556 update url and pics (#310)
f918c6a3556 is described below

commit f918c6a355606a32157e537139653fc7ce18a051
Author: Hu Yanjun <[email protected]>
AuthorDate: Wed Sep 27 22:04:26 2023 +0800

    update url and pics (#310)
---
 ...ul-Until-we-Found-the-Replacement-for-Druid.md} |  16 ++--
 ...-an-Entire-MySQL-Database-for-Data-Analysis.md} |   8 +-
 ...imes-More-Cost-Effective-Than-Elasticsearch.md} |  16 ++--
 ...g-a-Data-Warehouse-for-Traditional-Industry.md} |  16 ++--
 ...t-Generation-Data-Lakehouse-10X-Performance.md} |  24 +++---
 ...-Financial-Risk-Management-What-to-Consider.md} |   6 +-
 ...ction-How-Fast-Data-Queries-Are-Implemented.md} |  18 ++--
 ...-thousand-Dashboards-Without-Causing-a-Mess.md} |   8 +-
 ...-Build-a-Simple-but-Solid-Data-Architecture.md} |   6 +-
 ...ased-Database-Query-Concurrency-by-20-Times.md} |  12 +--
 ...ery-Speed-to-Make-the-Most-out-of-Your-Data.md} |  14 +--
 ...-the-Data-Update-Mechanism-of-Your-Database.md} |   8 +-
 ...to-That-Poor-BI-Engineer-We-Need-Fast-Joins.md} |  12 +--
 ...Per-Day-and-Keep-Big-Queries-Within-1-Second.md |  94 +++++++++++++++++++++
 ...sticsearch-and-PostgreSQL-with-Apache-Doris.md} |  24 +++---
 ...Management.md => Say-Goodbye-to-OOM-Crashes.md} |  20 ++---
 ...o-Building-a-High-Performing-Risk-Data-Mart.md} |  14 +--
 ...Why-We-Went-from-ClickHouse-to-Apache-Doris.md} |  30 ++++---
 blog/Tencent-LLM.md                                |  18 ++--
 ...rage-for-Hot-and-Cold-Data-What-Why-and-How.md} |  28 +++---
 ... Understanding-Data-Compaction-in-3-Minutes.md} |  10 +--
 ...rio.md => Use-Apache-Doris-with-AI-chatbots.md} |   4 +-
 ...bda-Architecture-for-40%-Faster-Performance.md} |  12 +--
 blog/release-note-2.0.0.md                         |  26 +++---
 static/images/Unicom-1.png                         | Bin 0 -> 239216 bytes
 static/images/Unicom-2.png                         | Bin 0 -> 200516 bytes
 26 files changed, 272 insertions(+), 172 deletions(-)

diff --git a/blog/360.md 
b/blog/AB-Testing-was-a-Handful-Until-we-Found-the-Replacement-for-Druid.md
similarity index 94%
rename from blog/360.md
rename to 
blog/AB-Testing-was-a-Handful-Until-we-Found-the-Replacement-for-Druid.md
index 2a33dbdafd0..5b00fb1e943 100644
--- a/blog/360.md
+++ b/blog/AB-Testing-was-a-Handful-Until-we-Found-the-Replacement-for-Druid.md
@@ -39,7 +39,7 @@ Let me show you our long-term struggle with our previous 
Druid-based data platfo
 
 This was our real-time datawarehouse, where Apache Storm was the real-time 
data processing engine and Apache Druid pre-aggregated the data. However, Druid 
did not support certain paging and join queries, so we wrote data from Druid to 
MySQL regularly, making MySQL the "materialized view" of Druid. But that was 
only a duct tape solution as it couldn't support our ever enlarging real-time 
data size. So data timeliness was unattainable.
 
-![](../static/images/360_1.png)
+![Apache-Storm-Apache-Druid-MySQL](../static/images/360_1.png)
 
 ## Platform Architecture 2.0
 
@@ -47,7 +47,7 @@ This was our real-time datawarehouse, where Apache Storm was 
the real-time data
 
 This time, we replaced Storm with Flink, and MySQL with TiDB. Flink was more 
powerful in terms of semantics and features, while TiDB, with its distributed 
capability, was more maintainable than MySQL. But architecture 2.0 was nowhere 
near our goal of end-to-end data consistency, either, because when processing 
huge data, enabling TiDB transactions largely slowed down data writing. Plus, 
Druid itself did not support standard SQL, so there were some learning costs 
and frictions in usage.
 
-![](../static/images/360_2.png)
+![Apache-Flink-Apache-Druid-TiDB](../static/images/360_2.png)
 
 ## Platform Architecture 3.0
 
@@ -55,7 +55,7 @@ This time, we replaced Storm with Flink, and MySQL with TiDB. 
Flink was more pow
 
 We replaced Apache Druid with Apache Doris as the OLAP engine, which could 
also serve as a unified data serving gateway. So in Architecture 3.0, we only 
need to maintain one set of query logic. And we layered our real-time 
datawarehouse to increase reusability of real-time data.
 
-![](../static/images/360_3.png)
+![Apache-Flink-Apache-Doris](../static/images/360_3.png)
 
 Turns out the combination of Flink and Doris was the answer. We can exploit 
their features to realize quick computation and data consistency. Keep reading 
and see how we make it happen.
 
@@ -67,7 +67,7 @@ Then we tried moving part of such workload to the computation 
engine. So we trie
 
 Our third shot was to aggregate data locally in Flink right after we split it. 
As is shown below, we create a window in the memory of one operator for local 
aggregation; then we further aggregate it using the global hash windows. Since 
two operators chained together are in one thread, transferring data between 
operators consumes much less network resources. **The two-step aggregation 
method, combined with the** **[Aggregate 
model](https://doris.apache.org/docs/dev/data-table/data-model)* [...]
 
-![](../static/images/360_4.png)
+![Apache-Flink-Apache-Doris-2](../static/images/360_4.png)
 
 For convenience in A/B testing, we make the test tag ID the first sorted field 
in Apache Doris, so we can quickly locate the target data using sorted indexes. 
To further minimize data processing in queries, we create materialized views 
with the frequently used dimensions. With constant modification and updates, 
the materialized views are applicable in 80% of our queries.
 
@@ -83,7 +83,7 @@ To ensure end-to-end data integrity, we developed a 
Sink-to-Doris component. It
 
 It is the result of our long-term evolution. We used to ensure data 
consistency by implementing "one writing for one tag ID". Then we realized we 
could make good use of the transactions in Apache Doris and the two-stage 
commit of Apache Flink. 
 
-![](../static/images/360_5.png)
+![idempotent-writing-two-stage-commit](../static/images/360_5.png)
 
 As is shown above, this is how two-stage commit works to guarantee data 
consistency:
 
@@ -98,17 +98,17 @@ We make it possible to split a single checkpoint into 
multiple transactions, so
 
 This is how we implement Sink-to-Doris. The component has blocked API calls 
and topology assembly. With simple configuration, we can write data into Apache 
Doris via Stream Load. 
 
-![](../static/images/360_6.png)
+![Sink-to-Doris](../static/images/360_6.png)
 
 ### Cluster Monitoring
 
 For cluster and host monitoring, we adopted the metrics templates provided by 
the Apache Doris community. For data monitoring, in addition to the template 
metrics, we added Stream Load request numbers and loading rates.
 
-![](../static/images/360_7.png)
+![stream-load-cluster-monitoring](../static/images/360_7.png)
 
 Other metrics of our concerns include data writing speed and task processing 
time. In the case of anomalies, we will receive notifications in the form of 
phone calls, messages, and emails.
 
-![](../static/images/360_8.png)
+![cluster-monitoring](../static/images/360_8.png)
 
 ## Key Takeaways
 
diff --git a/blog/FDC.md 
b/blog/Auto-Synchronization-of-an-Entire-MySQL-Database-for-Data-Analysis.md
similarity index 97%
rename from blog/FDC.md
rename to 
blog/Auto-Synchronization-of-an-Entire-MySQL-Database-for-Data-Analysis.md
index 531489100d3..46ad6d5f3de 100644
--- a/blog/FDC.md
+++ b/blog/Auto-Synchronization-of-an-Entire-MySQL-Database-for-Data-Analysis.md
@@ -1,6 +1,6 @@
 ---
 {
-    'title': 'Auto-Synchronization of an Entire MySQL Database for Data 
Analysis',
+    'title': 
'Auto-Synchronization-of-an-Entire-MySQL-Database-for-Data-Analysis',
     'summary': "Flink-Doris-Connector 1.4.0 allows users to ingest a whole 
database (MySQL or Oracle) that contains thousands of tables into Apache Doris, 
in one step.",
     'date': '2023-08-16',
     'author': 'Apache Doris',
@@ -94,11 +94,11 @@ When it comes to synchronizing a whole database (containing 
hundreds or even tho
 
 Under pressure test, the system showed high stability, with key metrics as 
follows:
 
-![](../static/images/FDC_1.png)
+![Flink-Doris-Connector](../static/images/FDC_1.png)
 
-![](../static/images/FDC_2.png)
+![Flink-CDC](../static/images/FDC_2.png)
 
-![](../static/images/FDC_3.png)
+![Doris-Cluster-Compaction-Score](../static/images/FDC_3.png)
 
 According to feedback from early adopters, the Connector has also delivered 
high performance and system stability in 10,000-table database synchronization 
in their production environment. This proves that the combination of Apache 
Doris and Flink CDC is capable of large-scale data synchronization with high 
efficiency and reliability.
 
diff --git a/blog/Inverted Index.md 
b/blog/Building-A-Log-Analytics-Solution-10-Times-More-Cost-Effective-Than-Elasticsearch.md
similarity index 96%
rename from blog/Inverted Index.md
rename to 
blog/Building-A-Log-Analytics-Solution-10-Times-More-Cost-Effective-Than-Elasticsearch.md
index a3286745161..d26f743c127 100644
--- a/blog/Inverted Index.md    
+++ 
b/blog/Building-A-Log-Analytics-Solution-10-Times-More-Cost-Effective-Than-Elasticsearch.md
@@ -53,7 +53,7 @@ There exist two common log processing solutions within the 
industry, exemplified
 - **Inverted index (Elasticsearch)**: It is well-embraced due to its support 
for full-text search and high performance. The downside is the low throughput 
in real-time writing and the huge resource consumption in index creation.
 - **Lightweight index / no index (Grafana Loki)**: It is the opposite of 
inverted index because it boasts high real-time write throughput and low 
storage cost but delivers slow queries.
 
-![](../static/images/Inverted_1.png)
+![Elasticsearch-and-Grafana-Loki](../static/images/Inverted_1.png)
 
 ## Introduction to Inverted Index
 
@@ -63,7 +63,7 @@ Inverted indexing was originally used to retrieve words or 
phrases in texts. The
 
 Upon data writing, the system tokenizes texts into **terms**, and stores these 
terms in a **posting list** which maps terms to the ID of the row where they 
exist. In text queries, the database finds the corresponding **row ID** of the 
keyword (term) in the posting list, and fetches the target row based on the row 
ID. By doing so, the system won't have to traverse the whole dataset and thus 
improves query speeds by orders of magnitudes. 
 
-![](../static/images/Inverted_2.png)
+![inverted-index](../static/images/Inverted_2.png)
 
 In inverted indexing of Elasticsearch, quick retrieval comes at the cost of 
writing speed, writing throughput, and storage space. Why? Firstly, 
tokenization, dictionary sorting, and inverted index creation are all CPU- and 
memory-intensive operations. Secondly, Elasticssearch has to store the original 
data, the inverted index, and an extra copy of data stored in columns for query 
acceleration. That's triple redundancy. 
 
@@ -87,14 +87,14 @@ In [Apache Doris](https://github.com/apache/doris), we opt 
for the other way. Bu
 
 In Apache Doris, data is arranged in the following format. Indexes are stored 
in the Index Region:
 
-![](../static/images/Inverted_3.png)
+![index-region-in-Apache-Doris](../static/images/Inverted_3.png)
 
 We implement inverted indexes in a non-intrusive manner:
 
 1. **Data ingestion & compaction**: As a segment file is written into Doris, 
an inverted index file will be written, too. The index file path is determined 
by the segment ID and the index ID. Rows in segments correspond to the docs in 
indexes, so are the RowID and the DocID.
 2. **Query**: If the `where` clause includes a column with inverted index, the 
system will look up in the index file, return a DocID list, and convert the 
DocID list into a RowID Bitmap. Under the RowID filtering mechanism of Apache 
Doris, only the target rows will be read. This is how queries are accelerated.
 
-![](../static/images/Inverted_4.png)
+![non-intrusive-inverted-index](../static/images/Inverted_4.png)
 
 Such non-intrusive method separates the index file from the data files, so you 
can make any changes to the inverted indexes without worrying about affecting 
the data files themselves or other indexes. 
 
@@ -160,12 +160,12 @@ For a fair comparison, we ensure uniformity of testing 
conditions, including ben
 
 - **Results of Apache Doris**:
 
-- - Writing Speed: 550 MB/s, **4.2 times that of Elasticsearch**
+  - Writing Speed: 550 MB/s, **4.2 times that of Elasticsearch**
   - Compression Ratio: 10:1
   - Storage Usage: **20% that of Elasticsearch**
   - Response Time: **43% that of Elasticsearch**
 
-![](../static/images/Inverted_5.png)
+![Apache-Doris-VS-Elasticsearch](../static/images/Inverted_5.png)
 
 ### Apache Doris VS ClickHouse
 
@@ -179,7 +179,7 @@ As ClickHouse launched inverted index as an experimental 
feature in v23.1, we te
 
 **Result**: Apache Doris was **4.7 times, 12 times, 18.5 times** faster than 
ClickHouse in the three queries, respectively.
 
-![](../static/images/Inverted_6.png)
+![Apache-Doris-VS-ClickHouse](../static/images/Inverted_6.png)
 
 ## Usage & Example
 
@@ -246,4 +246,4 @@ For more feature introduction and usage guide, see 
documentation: [Inverted Inde
 
 In a word, what contributes to Apache Doris' 10-time higher cost-effectiveness 
than Elasticsearch is its OLAP-tailored optimizations for inverted indexing, 
supported by the columnar storage engine, massively parallel processing 
framework, vectorized query engine, and cost-based optimizer of Apache Doris. 
 
-As proud as we are about our own inverted indexing solution, we understand 
that self-published benchmarks can be controversial, so we are open to 
[feedback](https://t.co/KcxAtAJZjZ) from any third-party users and see how 
[Apache Doris](https://github.com/apache/doris) works in real-world cases.
\ No newline at end of file
+As proud as we are about our own inverted indexing solution, we understand 
that self-published benchmarks can be controversial, so we are open to 
[feedback](https://t.co/KcxAtAJZjZ) from any third-party users and see how 
[Apache Doris](https://github.com/apache/doris) works in real-world cases.
diff --git a/blog/Midland Realty.md 
b/blog/Building-a-Data-Warehouse-for-Traditional-Industry.md
similarity index 95%
rename from blog/Midland Realty.md
rename to blog/Building-a-Data-Warehouse-for-Traditional-Industry.md
index f963f0b0750..8b84a27d4a4 100644
--- a/blog/Midland Realty.md    
+++ b/blog/Building-a-Data-Warehouse-for-Traditional-Industry.md
@@ -38,7 +38,7 @@ Now let's get started.
 
 Logically, our data architecture can be divided into four parts.
 
-![](../static/images/Midland_1.png)
+![data-processing-architecture](../static/images/Midland_1.png)
 
 - **Data integration**: This is supported by Flink CDC, DataX, and the 
Multi-Catalog feature of Apache Doris.
 - **Data management**: We use Apache Dolphinscheduler for script lifecycle 
management, privileges in multi-tenancy management, and data quality monitoring.
@@ -53,7 +53,7 @@ We create our dimension tables and fact tables centering each 
operating entity i
 
 Our data warehouse is divided into five conceptual layers. We use Apache Doris 
and Apache DolphinScheduler to schedule the DAG scripts between these layers.
 
-![](../static/images/Midland_2.png)
+![ODS-DWD-DWS-ADS-DIM](../static/images/Midland_2.png)
 
 Every day, the layers go through an overall update besides incremental updates 
in case of changes in historical status fields or incomplete data 
synchronization of ODS tables.
 
@@ -91,7 +91,7 @@ This is our configuration for TBs of legacy data and GBs of 
incremental data. Yo
 
 1. To integrate offline data and log data, we use DataX, which supports CSV 
format and readers of many relational databases, and Apache Doris provides a 
DataX-Doris-Writer.
 
-![](../static/images/Midland_3.png)
+![DataX-Doris-Writer](../static/images/Midland_3.png)
 
 2. We use Flink CDC to synchronize data from source tables. Then we aggregate 
the real-time metrics utilizing the Materialized View or the Aggregate Model of 
Apache Doris. Since we only have to process part of the metrics in a real-time 
manner and we don't want to generate too many database connections, we use one 
Flink job to maintain multiple CDC source tables. This is realized by the 
multi-source merging and full database sync features of Dinky, or you can 
implement a Flink DataStream [...]
 
@@ -123,11 +123,11 @@ EXECUTE CDCSOURCE demo_doris WITH (
 
 3. We use SQL scripts or "Shell + SQL" scripts, and we perform script 
lifecycle management. At the ODS layer, we write a general DataX job file and 
pass parameters for each source table ingestion, instead of writing a DataX job 
for each source table. In this way, we make things much easier to maintain. We 
manage the ETL scripts of Apache Doris on DolphinScheduler, where we also 
conduct version control. In case of any errors in the production environment, 
we can always rollback.
 
-![](../static/images/Midland_4.png)
+![SQL-script](../static/images/Midland_4.png)
 
 4. After ingesting data with ETL scripts, we create a page in our reporting 
tool. We assign different privileges to different accounts using SQL, including 
the privilege of modifying rows, fields, and global dictionary. Apache Doris 
supports privilege control over accounts, which works the same as that in 
MySQL. 
 
-![](../static/images/Midland_5.png)
+![privilege-control-over-accounts](../static/images/Midland_5.png)
 
 We also use Apache Doris data backup for disaster recovery, Apache Doris audit 
logs to monitor SQL execution efficiency, Grafana+Loki for cluster metric 
alerts, and Supervisor to monitor the daemon processes of node components.
 
@@ -165,8 +165,10 @@ For us, it is important to create a data dictionary 
because it largely reduces p
 
 Actually, before we evolved into our current data architecture, we tried Hive, 
Spark and Hadoop to build an offline data warehouse. It turned out that Hadoop 
was overkill for a traditional company like us since we didn't have too much 
data to process. It is important to find the component that suits you most.
 
-![](../static/images/Midland_6.png)
+![old-offline-data warehouse](../static/images/Midland_6.png)
 
 (Our old off-line data warehouse)
 
-On the other hand, to smoothen our  big data transition, we need to make our 
data platform as simple as possible in terms of usage and maintenance. That's 
why we landed on Apache Doris. It is compatible with MySQL protocol and 
provides a rich collection of functions so we don't have to develop our own 
UDFs. Also, it is composed of only two types of processes: frontends and 
backends, so it is easy to scale and track
\ No newline at end of file
+On the other hand, to smoothen our  big data transition, we need to make our 
data platform as simple as possible in terms of usage and maintenance. That's 
why we landed on Apache Doris. It is compatible with MySQL protocol and 
provides a rich collection of functions so we don't have to develop our own 
UDFs. Also, it is composed of only two types of processes: frontends and 
backends, so it is easy to scale and track.
+
+Find Apache Doris developers on 
[Slack](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-1t3wfymur-0soNPATWQ~gbU8xutFOLog).
diff --git a/blog/Data Lakehouse.md 
b/blog/Building-the-Next-Generation-Data-Lakehouse-10X-Performance.md
similarity index 94%
rename from blog/Data Lakehouse.md
rename to blog/Building-the-Next-Generation-Data-Lakehouse-10X-Performance.md
index 2be2b4be159..7f26a7eba44 100644
--- a/blog/Data Lakehouse.md    
+++ b/blog/Building-the-Next-Generation-Data-Lakehouse-10X-Performance.md
@@ -51,7 +51,7 @@ To turn these visions into reality, a data query engine needs 
to figure out the
 
 Apache Doris 1.2.2 supports a wide variety of data lake formats and data 
access from various external data sources. Besides, via the Table Value 
Function, users can analyze files in object storage or HDFS directly.
 
-![](../static/images/Lakehouse/Lakehouse_1.png)
+![data-sources-supported-in-data-lakehouse](../static/images/Lakehouse/Lakehouse_1.png)
 
 
 
@@ -73,7 +73,7 @@ Older versions of Doris support a two-tiered metadata 
structure: database and ta
 1. You can map to the whole external data source and ingest all metadata from 
it.
 2. You can manage the properties of the specified data source at the catalog 
level, such as connection, privileges, and data ingestion details, and easily 
handle multiple data sources.
 
-![](../static/images/Lakehouse/Lakehouse_2.png)
+![metadata-structure](../static/images/Lakehouse/Lakehouse_2.png)
 
 
 
@@ -114,7 +114,7 @@ This also paves the way for developers who want to connect 
to more data sources
 
 Access to external data sources is often hindered by network conditions and 
data resources. This requires extra efforts of a data query engine to guarantee 
reliability, stability, and real-timeliness in metadata access.
 
-![](../static/images/Lakehouse/Lakehouse_3.png)
+![metadata-access-Hive-MetaStore](../static/images/Lakehouse/Lakehouse_3.png)
 
 Doris enables high efficiency in metadata access by **Meta Cache**, which 
includes Schema Cache, Partition Cache, and File Cache. This means that Doris 
can respond to metadata queries on thousands of tables in milliseconds. In 
addition, Doris supports manual refresh of metadata at the 
Catalog/Database/Table level. Meanwhile, it enables auto synchronization of 
metadata in Hive Metastore by monitoring Hive Metastore Event, so any changes 
can be updated within seconds.
 
@@ -122,13 +122,13 @@ Doris enables high efficiency in metadata access by 
**Meta Cache**, which includ
 
 External data sources usually come with their own privilege management 
services. Many companies use one single tool (such as Apache Ranger) to provide 
authorization for their multiple data systems. Doris supports a custom 
authorization plugin, which can be connected to the user's own privilege 
management system via the Doris Access Controller interface. As a user, you 
only need to specify the authorization plugin for a newly created catalog, and 
then you can readily perform authorization [...]
 
-![](../static/images/Lakehouse/Lakehouse_4.png)
+![custom-authorization](../static/images/Lakehouse/Lakehouse_4.png)
 
 ### Data Access
 
 Doris supports data access to external storage systems, including HDFS and 
S3-compatible object storage:
 
-![](../static/images/Lakehouse/Lakehouse_5.png)
+![access-to-external-storage-systems](../static/images/Lakehouse/Lakehouse_5.png)
 
 
 
@@ -156,7 +156,7 @@ Doris caches files from remote storage in local 
high-performance disks as a way
 1. **Block cache**: Doris supports the block cache of remote files and can 
automatically adjust the block size from 4KB to 4MB based on the read request. 
The block cache method reduces read/write amplification and read latency in 
cold caches.
 2. **Consistent hashing for caching**: Doris applies consistent hashing to 
manage cache locations and schedule data scanning. By doing so, it prevents 
cache failures brought about by the online and offlining of nodes. It can also 
increase cache hit rate and query service stability.
 
-![](../static/images/Lakehouse/Lakehouse_6.png)
+![file-cache](../static/images/Lakehouse/Lakehouse_6.png)
 
 #### Execution Engine
 
@@ -165,7 +165,7 @@ Developers surely don't want to rebuild all the general 
features for every new d
 - **Layer the logic**: All data queries in Doris, including those on internal 
tables, use the same operators, such as Join, Sort, and Agg. The only 
difference between queries on internal and external data lies in data access. 
In Doris, anything above the scan nodes follows the same query logic, while 
below the scan nodes, the implementation classes will take care of access to 
different data sources.
 - **Use a general framework for scan operators**: Even for the scan nodes, 
different data sources have a lot in common, such as task splitting logic, 
scheduling of sub-tasks and I/O, predicate pushdown, and Runtime Filter. 
Therefore, Doris uses interfaces to handle them. Then, it implements a unified 
scheduling logic for all sub-tasks. The scheduler is in charge of all scanning 
tasks in the node. With global information of the node in hand, the schedular 
is able to do fine-grained manage [...]
 
-![](../static/images/Lakehouse/Lakehouse_7.png)
+![execution-engine](../static/images/Lakehouse/Lakehouse_7.png)
 
 #### Query Optimizer
 
@@ -175,9 +175,9 @@ Doris supports a range of statistical information from 
various data sources, inc
 
 We tested Doris and Presto/Trino on HDFS in flat table scenarios (ClickBench) 
and multi-table scenarios (TPC-H). Here are the results:
 
-![](../static/images/Lakehouse/Lakehouse_8.png)
+![Apache-Doris-VS-Trino-Presto-ClickBench](../static/images/Lakehouse/Lakehouse_8.png)
 
-![](../static/images/Lakehouse/Lakehouse_9.png)
+![Apache-Doris-VS-Trino-Presto-TPCH](../static/images/Lakehouse/Lakehouse_9.png)
 
 
 
@@ -187,7 +187,7 @@ As is shown, with the same computing resources and on the 
same dataset, Apache D
 
 Querying external data sources requires no internal storage of Doris. This 
makes elastic stateless computing nodes possible. Apache Doris 2.0 is going to 
implement Elastic Compute Node, which is dedicated to supporting query 
workloads of external data sources.
 
-![](../static/images/Lakehouse/Lakehouse_10.png)
+![stateless-compute-nodes](../static/images/Lakehouse/Lakehouse_10.png)
 
 Stateless computing nodes are open for quick scaling so users can easily cope 
with query workloads during peaks and valleys and strike a balance between 
performance and cost. In addition, Doris has optimized itself for Kubernetes 
cluster management and node scheduling. Now Master nodes can automatically 
manage the onlining and offlining of Elastic Compute Nodes, so users can govern 
their cluster workloads in cloud-native and hybrid cloud scenarios without 
difficulty.
 
@@ -199,7 +199,7 @@ Apache Doris has been adopted by a financial institution 
for risk management. Th
 - Doris makes it possible to perform real-time federated queries using 
Elasticsearch Catalog and achieve a response time of mere milliseconds.
 - Doris enables the decoupling of daily batch processing and statistical 
analysis, bringing less resource consumption and higher system stability.
 
-![](../static/images/Lakehouse/Lakehouse_11.png)
+![use-case-of-data-lakehouse](../static/images/Lakehouse/Lakehouse_11.png)
 
 
 
@@ -252,5 +252,5 @@ http://doris.apache.org
 
 https://github.com/apache/doris
 
-
+Find Apache Doris developers on 
[Slack](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-1t3wfymur-0soNPATWQ~gbU8xutFOLog).
 
diff --git a/blog/Xingyun.md 
b/blog/Choosing-an-OLAP-Engine-for-Financial-Risk-Management-What-to-Consider.md
similarity index 97%
rename from blog/Xingyun.md
rename to 
blog/Choosing-an-OLAP-Engine-for-Financial-Risk-Management-What-to-Consider.md
index 8a9a99782d4..3603ce0f723 100644
--- a/blog/Xingyun.md
+++ 
b/blog/Choosing-an-OLAP-Engine-for-Financial-Risk-Management-What-to-Consider.md
@@ -46,7 +46,7 @@ To speed up the highly concurrent point queries, you can 
create [Materialized Vi
 
 To facilitate queries on large tables, you can leverage the [Colocation 
Join](https://doris.apache.org/docs/dev/query-acceleration/join-optimization/colocation-join/)
 mechanism. Colocation Join minimizes data transfer between computation nodes 
to reduce overheads brought by data movement. Thus, it can largely improve 
query speed when joining large tables.
 
-![](../static/images/Xingyun_1.png)
+![colocation-join](../static/images/Xingyun_1.png)
 
 ## Log Analysis
 
@@ -54,7 +54,7 @@ Log analysis is important in financial data processing. 
Real-time processing and
 
 Retrieval is a major part of log analysis, so [Apache Doris 
2.0](https://doris.apache.org/docs/dev/releasenotes/release-2.0.0) supports 
inverted index, which is a way to accelerate text searching and 
equivalence/range queries on numerics and datetime. It allows users to quickly 
locate the log record that they need among the massive data. The JSON storage 
feature in Apache Doris is reported to reduce storage costs of user activity 
logs by 70%, and the variety of parse functions provided c [...]
 
-![](../static/images/Xingyun_2.png)
+![log-analysis](../static/images/Xingyun_2.png)
 
 ## Easy Maintenance
 
@@ -64,4 +64,4 @@ In addition to the easy deployment, Apache Doris has a few 
mechanisms that are d
 
 This is overall data architecture in the case. The user utilizes Apache Flume 
for log data collection, and DataX for data update. Data from multiple sources 
will be collected into Apache Doris to form a data mart, from which analysts 
extract information to generate reports and dashboards for reference in risk 
control and business decisions. As for stability of the data mart itself, 
Grafana and Prometheus are used to monitor memory usage, compaction score and 
query response time of Apache [...]
 
-![](../static/images/Xingyun_3.png)
\ No newline at end of file
+![data-architecture](../static/images/Xingyun_3.png)
diff --git a/blog/Zhihu.md 
b/blog/Database-Dissection-How-Fast-Data-Queries-Are-Implemented.md
similarity index 94%
rename from blog/Zhihu.md
rename to blog/Database-Dissection-How-Fast-Data-Queries-Are-Implemented.md
index 9bef297d53a..54b2d4e0bb4 100644
--- a/blog/Zhihu.md
+++ b/blog/Database-Dissection-How-Fast-Data-Queries-Are-Implemented.md
@@ -50,7 +50,7 @@ User segmentation is when analysts pick out a group of 
website users that share
 
 We realize that instead of executing set operations on one big dataset, we can 
divide our dataset into smaller ones, execute set operations on each of them, 
and then merge all the results. In this way, each small dataset is computed by 
one thread/queue. Then we have a queue to do the final merging. It's simple 
distributed computing thinking.
 
-![](../static/images/Zhihu_1.png)
+![distributed-computing-in-database](../static/images/Zhihu_1.png)
 
 Example:
 
@@ -65,17 +65,17 @@ The problem here is, since user tags are randomly 
distributed across various mac
 
 This is enabled by the Colocate mechanism of Apache Doris. The idea of 
Colocate is to place data chunks that are often accessed together onto the same 
node, so as to reduce cross-node data transfer and thus, get lower latency.
 
-![](../static/images/Zhihu_2.png)
+![colocate-mechanism](../static/images/Zhihu_2.png)
 
 The implementation is simple: Bind one group key to one machine. Then 
naturally, data corresponding to that group key will be pre-bound to that 
machine. 
 
 The following is the query plan before we adopted Collocate: It is 
complicated, with a lot of data shuffling.
 
-![](../static/images/Zhihu_3.png)
+![complicated-data-shuffling](../static/images/Zhihu_3.png)
 
 This is the query plan after. It is much simpler, which is why queries are 
much faster and less costly.
 
-![](../static/images/Zhihu_4.png)
+![simpler-query-plan-after-colocation-join](../static/images/Zhihu_4.png)
 
 ### 3.Merge the operators
 
@@ -89,7 +89,7 @@ 
orthogonal_bitmap_union_count==bitmap_and(bitmap1,bitmap_and(bitmap2,bitmap3)
 
 Query execution with one compound function is much faster than that with a 
chain of simple functions, as you can tell from the lengths of the flow charts:
 
-![](../static/images/Zhihu_5.png)
+![operator-merging](../static/images/Zhihu_5.png)
 
 - **Multiple Simple functions**: This involves three function executions and 
two intermediate storage. It's a long and slow process.
 - **One compound function**: Simple in and out.
@@ -102,11 +102,11 @@ This is about putting the right workload on the right 
component. Apache Doris su
 
 In offline data ingestion, we used to perform most computation in Apache Hive, 
write the data files to HDFS, and pull data regularly from HDFS to Apache 
Doris. However, after Doris obtains parquet files from HDFS, it performs a 
series of operations on them before it can turn them into segment files: 
decompressing, bucketing, sorting, aggregating, and compressing. These 
workloads will be borne by Doris backends, which have to undertake a few bitmap 
operations at the same time. So there is [...]
 
-![](../static/images/Zhihu_6.png)
+![Broker-Load](../static/images/Zhihu_6.png)
 
 So we decided on the Spark Load method. It allows us to split the ingestion 
process into two parts: computation and storage, so we can move all the 
bucketing, sorting, aggregating, and compressing to Spark clusters. Then Spark 
writes the output to HDFS, from which Doris pulls data and flushes it to the 
local disks.
 
-![](../static/images/Zhihu_7.png)
+![Spark-Load](../static/images/Zhihu_7.png)
 
 When ingesting 1.2 TB data (that's 110 billion rows), the Spark Load method 
only took 55 minutes. 
 
@@ -126,7 +126,7 @@ They compared query response time before and after the 
vectorization in seven of
 
 The results are as below:
 
-![](../static/images/Zhihu_8.png)
+![performance-after-vectorization](../static/images/Zhihu_8.png)
 
 ## Conclusion
 
@@ -137,4 +137,4 @@ In short, what contributed to the fast data loading and 
data queries in this cas
 - Support for a wide range of data loading methods to choose from
 - A vectorized engine that brings overall performance increase
 
-It takes efforts from both the database developers and users to make fast 
performance possible. The user's experience and knowledge of their own status 
quo will allow them to figure out the quickest path, while a good database 
design will help pave the way and make users' life easier.
\ No newline at end of file
+It takes efforts from both the database developers and users to make fast 
performance possible. The user's experience and knowledge of their own status 
quo will allow them to figure out the quickest path, while a good database 
design will help pave the way and make users' life easier.
diff --git a/blog/Pingan.md 
b/blog/Database-in-Fintech-How-to-Support-ten-thousand-Dashboards-Without-Causing-a-Mess.md
similarity index 94%
rename from blog/Pingan.md
rename to 
blog/Database-in-Fintech-How-to-Support-ten-thousand-Dashboards-Without-Causing-a-Mess.md
index f8d49ab4808..07cadf15983 100644
--- a/blog/Pingan.md
+++ 
b/blog/Database-in-Fintech-How-to-Support-ten-thousand-Dashboards-Without-Causing-a-Mess.md
@@ -50,11 +50,11 @@ When the metrics are soundly put in place, you can ingest 
new data into your dat
 
 As is mentioned, some metrics are produced by combining multiple fields in the 
source table. In data engineering, that is a multi-table join query. Based on 
the optimization experience of an Apache Doris user, we recommend flat tables 
instead of Star/Snowflake Schema. The user reduced the query response time on 
tables of 100 million rows **from 5s to 63ms** after such a change.
 
-![](../static/images/Pingan_1.png)
+![join-queries](../static/images/Pingan_1.png)
 
 The flat table solution also eliminates jitter.
 
-![](../static/images/Pingan_2.png)
+![reduced-jitter](../static/images/Pingan_2.png)
 
 ## Enable SQL Caching to Reduce Resource Consumption
 
@@ -65,13 +65,13 @@ Analysts often check data reports of the same metrics on a 
regular basis. These
 - A TPS (Transactions Per Second) of 300 is reached, with CPU, memory, disk, 
and I/O usage under 80%;
 - Under the recommended cluster size, over 10,000 metrics can be cached, which 
means you can save a lot of computation resources.
 
-![](../static/images/Pingan_3.png)
+![reduced-computation-resources](../static/images/Pingan_3.png)
 
 ## Conclusion
 
 The complexity of data analysis in the financial industry lies in the data 
itself other than the engineering side. Thus, the underlying data architecture 
should focus on facilitating the unified and efficient management of data. 
Apache Doris provides the flexibility of simple metric registration and the 
ability of fast and resource-efficient metric computation. In this case, the 
user is able to handle 10,000 active financial metrics in 10,000 dashboards 
with 30% less ETL efforts.
 
-
+Find Apache Doris developers on 
[Slack](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-1t3wfymur-0soNPATWQ~gbU8xutFOLog).
 
 
 
diff --git a/blog/Poly.md 
b/blog/For-Entry-Level-Data-Engineers-How-to-Build-a-Simple-but-Solid-Data-Architecture.md
similarity index 97%
rename from blog/Poly.md
rename to 
blog/For-Entry-Level-Data-Engineers-How-to-Build-a-Simple-but-Solid-Data-Architecture.md
index a0bfb3f9836..d682110bc6f 100644
--- a/blog/Poly.md
+++ 
b/blog/For-Entry-Level-Data-Engineers-How-to-Build-a-Simple-but-Solid-Data-Architecture.md
@@ -41,7 +41,7 @@ A prominent feature of ticketing services is the periodic 
spikes in ticket order
 
 The building blocks of this architecture are simple. You only need Apache 
Flink and Apache Kafka for data ingestion, and Apache Doris as an analytic data 
warehouse. 
 
-![](../static/images/Poly_1.png)
+![simple-data-architecture-with-Apache-Doris](../static/images/Poly_1.png)
 
 Connecting data sources to the data warehouse is simple, too. The key 
component, Apache Doris, supports various data loading methods to fit with 
different data sources. You can perform column mapping, transforming, and 
filtering during data loading to avoid duplicate collection of data. To ingest 
a table, users only need to add the table name to the configurations, instead 
of writing a script themselves. 
 
@@ -53,7 +53,7 @@ Flink CDC was found to be the optimal choice if you are 
looking for higher stabi
 - Create two CDC jobs in Flink, one to capture the changed data (the Forward 
stream), the other to update the table management configurations (the Broadcast 
stream).
 - Configure all tables of the source database at the Sink end (the output end 
of Flink CDC). When there is newly added table in the source database, the 
Broadcast stream will be triggered to update the table management 
configurations. (You just need to configure the tables, instead of "creating" 
the tables.)
 
-![](../static/images/Poly_2.png)
+![configure-Flink-CDC](../static/images/Poly_2.png)
 
 ## Layering of Data Warehouse
 
@@ -76,7 +76,7 @@ Like many non-tech business, the ticketing service provider 
needs a data warehou
 - **Data Analysis**: This involves data such as membership orders, attendance 
rates, and user portraits.
 - **Dashboarding**: This is to visually display sales data.
 
-![](../static/images/Poly_3.png)
+![Real-Time-Data-Warehouse-and-Reporting](../static/images/Poly_3.png)
 
 These are all entry-level tasks in data analytics. One of the biggest burdens 
for the data engineers was to quickly develop new reports as the internal 
analysts required. The [Aggregate Key 
Model](https://doris.apache.org/docs/dev/data-table/data-model#aggregate-model) 
of Apache Doris is designed for this. 
 
diff --git a/blog/High_concurrency.md 
b/blog/How-We-Increased-Database-Query-Concurrency-by-20-Times.md
similarity index 98%
rename from blog/High_concurrency.md
rename to blog/How-We-Increased-Database-Query-Concurrency-by-20-Times.md
index 85d3240482c..348a1dd1e82 100644
--- a/blog/High_concurrency.md
+++ b/blog/How-We-Increased-Database-Query-Concurrency-by-20-Times.md
@@ -152,7 +152,7 @@ Normally, an SQL statement is executed in three steps:
 
 For complex queries on massive data, it is better to follow the plan created 
by the Query Optimizer. However, for high-concurrency point queries requiring 
low latency, that plan is not only unnecessary but also brings extra overheads. 
That's why we implement a short-circuit plan for point queries. 
 
-![](../static/images/high-concurrency_1.png)
+![short-circuit-plan](../static/images/high-concurrency_1.png)
 
 Once the FE receives a point query request, a short-circuit plan will be 
produced. It is a lightweight plan that involves no equivalent transformation, 
logic optimization or physical optimization. Instead, it conducts some basic 
analysis on the AST, creates a fixed plan accordingly, and finds ways to reduce 
overhead of the optimizer.
 
@@ -182,13 +182,13 @@ rpc tablet_fetch_data(PTabletKeyLookupRequest) returns 
(PTabletKeyLookupResponse
 
 In high-concurrency queries, part of the CPU overhead comes from SQL analysis 
and parsing in FE. To reduce such overhead, in FE, we provide prepared 
statements that are fully compatible with MySQL protocol. With prepared 
statements, we can achieve a four-time performance increase for primary key 
point queries.
 
-![](../static/images/high-concurrency_2.png)
+![prepared-statement-map](../static/images/high-concurrency_2.png)
 
 The idea of prepared statements is to cache precomputed SQL and expressions in 
HashMap in memory, so they can be directly used in queries when applicable.
 
 Prepared statements adopt MySQL binary protocol for transmission. The protocol 
is implemented in the mysql_row_buffer.[h|cpp] file, and uses MySQL binary 
encoding. Under this protocol, the client (for example, JDBC Client) sends a 
pre-compiled statement to FE via `PREPARE` MySQL Command. Next, FE will parse 
and analyze the statement and cache it in the HashMap as shown in the figure 
above. Next, the client, using `EXECUTE` MySQL Command, will replace the 
placeholder, encode it into binar [...]
 
-![](../static/images/high-concurrency_3.png)
+![prepared-statement-execution](../static/images/high-concurrency_3.png)
 
 Apart from caching prepared statements in FE, we also cache reusable 
structures in BE. These structures include pre-allocated computation blocks, 
query descriptors, and output expressions. Serializing and deserializing these 
structures often cause a CPU hotspot, so it makes more sense to cache them. The 
prepared statement for each query comes with a UUID named CacheID. So when BE 
executes the point query, it will find the corresponding class based on the 
CacheID, and then reuse the struc [...]
 
@@ -220,13 +220,13 @@ resultSet = readStatement.executeQuery();
 
 Apache Doris has a Page Cache feature, where each page caches the data of one 
column. 
 
-![](../static/images/high-concurrency_4.png)
+![page-cache](../static/images/high-concurrency_4.png)
 
 As mentioned above, we have introduced row storage in Doris. The problem with 
this is, one row of data consists of multiple columns, so in the case of big 
queries, the cached data might be erased. Thus, we also introduced row cache to 
increase row cache hit rate.
 
 Row cache reuses the LRU Cache mechanism in Apache Doris. When the caching 
starts, the system will initialize a threshold value. If that threshold is hit, 
the old cached rows will be phased out. For a primary key query statement, the 
performance gap between cache hit and cache miss can be huge (we are talking 
about dozens of times less disk I/O and memory access here). So the 
introduction of row cache can remarkably enhance point query performance.
 
-![](../static/images/high-concurrency_5.png)
+![row-cache](../static/images/high-concurrency_5.png)
 
 To enable row cache, you can specify the following configuration in BE:
 
@@ -283,7 +283,7 @@ SELECT * from usertable WHERE YCSB_KEY = ?
 
 We run the test with the optimizations (row storage, short-circuit, and 
prepared statement) enabled, and then did it again with all of them disabled. 
Here are the results:
 
-![](../static/images/high-concurrency_6.png)
+![performance-before-and-after-concurrency-optimization](../static/images/high-concurrency_6.png)
 
 With optimizations enabled, **the average query latency decreased by a 
whopping 96%, the 99th percentile latency was only 1/28 of that without 
optimizations, and it has achieved a query concurrency of over 30,000 QPS.** 
This is a huge leap in performance and an over 20-time increase in concurrency.
 
diff --git a/blog/Duyansoft.md 
b/blog/Improving-Query-Speed-to-Make-the-Most-out-of-Your-Data.md
similarity index 94%
rename from blog/Duyansoft.md
rename to blog/Improving-Query-Speed-to-Make-the-Most-out-of-Your-Data.md
index 53d825001eb..0d7a64081fa 100644
--- a/blog/Duyansoft.md
+++ b/blog/Improving-Query-Speed-to-Make-the-Most-out-of-Your-Data.md
@@ -29,7 +29,7 @@ under the License.
 
 > Author: Junfei Liu, Senior Architect of Duyansoft
 
-![](../static/images/Duyansoft/Duyansoft.png)
+![Duyansoft-use-case-of-Apache-Doris](../static/images/Duyansoft/Duyansoft.png)
 
 The world is getting more and more value out of data, as exemplified by the 
currently much-talked-about ChatGPT, which I believe is a robotic data analyst. 
However, in today’s era, what’s more important than the data itself is the 
ability to locate your wanted information among all the overflowing data 
quickly. So in this article, I will talk about how I improved overall data 
processing efficiency by optimizing the choice and usage of data warehouses.
 
@@ -37,13 +37,13 @@ The world is getting more and more value out of data, as 
exemplified by the curr
 
 The choice of data warehouses was never high on my worry list until 2021. I 
have been working as a data engineer for a Fintech SaaS provider since its 
incorporation in 2014. In the company’s infancy, we didn’t have too much data 
to juggle. We only needed a simple tool for OLTP and business reporting, and 
the traditional databases would cut the mustard.
 
-![](../static/images/Duyansoft/Duyan_1.png)
+![data-processing-pipeline-Duyansoft](../static/images/Duyansoft/Duyan_1.png)
 
 But as the company grew, the data we received became overwhelmingly large in 
volume and increasingly diversified in sources. Every day, we had tons of user 
accounts logging in and sending myriads of requests. It was like collecting 
water from a thousand taps to put out a million scattered pieces of fire in a 
building, except that you must bring the exact amount of water needed for each 
fire spot. Also, we got more and more emails from our colleagues asking if we 
could make data analysis  [...]
 
 The first thing we did was to revolutionize our data processing architecture. 
We used DataHub to collect all our transactional or log data and ingest it into 
an offline data warehouse for data processing (analyzing, computing. etc.). 
Then the results would be exported to MySQL and then forwarded to QuickBI to 
display the reports visually. We also replaced MongoDB with a real-time data 
warehouse for business queries.
 
-![](../static/images/Duyansoft/Duyan_2.png)
+![Data-ingestion-ETL-ELT-application](../static/images/Duyansoft/Duyan_2.png)
 
 This new architecture worked, but there remained a few pebbles in our shoes:
 
@@ -61,7 +61,7 @@ To begin with, we tried to move the largest tables from MySQL 
to [Apache Doris](
 
 As for now, we are using two Doris clusters: one to handle point queries (high 
QPS) from our users and the other for internal ad-hoc queries and reporting. As 
a result, users have reported smoother experience and we can provide more 
features that are used to be bottlenecked by slow query execution. Moving our 
dimension tables to Doris also brought less data errors and higher development 
efficiency.
 
-![](../static/images/Duyansoft/Duyan_3.png)
+![Data-ingestion-ETL-ELT-Doris-application](../static/images/Duyansoft/Duyan_3.png)
 
 Both the FE and BE processes of Doris can be scaled out, so tens of PBs of 
data stored in hundreds of devices can be put into one single cluster. In 
addition, the two types of processes implement a consistency protocol to ensure 
service availability and data reliability. This removes dependency on Hadoop 
and thus saves us the cost of deploying Hadoop clusters.
 
@@ -89,11 +89,11 @@ In addition to the various monitoring metrics of Doris, we 
deployed an audit log
 
 Slow SQL queries:
 
-![](../static/images/Duyansoft/Duyan_4.png)
+![slow-SQL-queries-monitoring](../static/images/Duyansoft/Duyan_4.png)
 
 Some of our often-used monitoring metrics:
 
-![](../static/images/Duyansoft/Duyan_5.png)
+![monitoring-metrics](../static/images/Duyansoft/Duyan_5.png)
 
 **Tradeoff Between Resource Usage and Real-Time Availability:**
 
@@ -116,4 +116,4 @@ As we set out to find a single data warehouse that could 
serve all our needs, we
 
 **Try** [**Apache Doris**](https://github.com/apache/doris) **out!**
 
-It is an open source real-time analytical database based on MPP architecture. 
It supports both high-concurrency point queries and high-throughput complex 
analysis. Or you can start your free trial of 
[**SelectDB**](https://en.selectdb.com/), a cloud-native real-time data 
warehouse developed based on the Apache Doris open source project by the same 
key developers.
\ No newline at end of file
+It is an open source real-time analytical database based on MPP architecture. 
It supports both high-concurrency point queries and high-throughput complex 
analysis. Or you can start your free trial of 
[**SelectDB**](https://en.selectdb.com/), a cloud-native real-time data 
warehouse developed based on the Apache Doris open source project by the same 
key developers.
diff --git a/blog/Data_Update.md 
b/blog/Is-Your-Latest-Data-Really-the-Latest-Check-the-Data-Update-Mechanism-of-Your-Database.md
similarity index 94%
rename from blog/Data_Update.md
rename to 
blog/Is-Your-Latest-Data-Really-the-Latest-Check-the-Data-Update-Mechanism-of-Your-Database.md
index d8bbd04511f..cb6f406e867 100644
--- a/blog/Data_Update.md
+++ 
b/blog/Is-Your-Latest-Data-Really-the-Latest-Check-the-Data-Update-Mechanism-of-Your-Database.md
@@ -32,11 +32,11 @@ In databases, data update is to add, delete, or modify 
data. Timely data update
 
 Technically speaking, there are two types of data updates: you either update a 
whole row (**Row Update**) or just update part of the columns (**Partial Column 
Update**). Many databases supports both of them, but in different ways. This 
post is about one of them, which is simple in execution and efficient in data 
quality guarantee. 
 
-As an open source analytic database, Apache Doris supports both Row Update and 
Partial Column Update with one data model: the **Unique Key Model**. It is 
where you put data that doesn't need to be aggregated. In the Unique Key Model, 
you can specify one column or the combination of several columns as the Unique 
Key (a.k.a. Primary Key). For one Unique Key, there will always be one row of 
data: the newly ingested data record replaces the old. That's how data updates 
work.
+As an open source analytic database, Apache Doris supports both Row Update and 
Partial Column Update with one data model: the [**Unique Key 
Model**](https://doris.apache.org/docs/dev/data-table/data-model#unique-model). 
It is where you put data that doesn't need to be aggregated. In the Unique Key 
Model, you can specify one column or the combination of several columns as the 
Unique Key (a.k.a. Primary Key). For one Unique Key, there will always be one 
row of data: the newly ingested data [...]
 
 The idea is straightforward, but in real-life implementation, it happens that 
the latest data does not arrive the last or doesn't even get written at all, so 
I'm going to show you how Apache Doris implements data update and avoids 
messups with its Unique Key Model. 
 
-![](../static/images/Dataupdate_1.png)
+![data-update](../static/images/Dataupdate_1.png)
 
 ## Row Update
 
@@ -134,11 +134,11 @@ The execution of the Update command consists of three 
steps in the system:
 - Step Two: Modify the order status from "Payment Pending" to "Delivery 
Pending" (1, 100, 'Delivery Pending')
 - Step Three: Insert the new row into the table
 
-![](../static/images/Dataupdate_2.png)
+![partial-column-update-1](../static/images/Dataupdate_2.png)
 
 The table is in the Unique Key Model, which means for rows of the same Unique 
Key, only the last inserted one will be reserved, so this is what the table 
will finally look like:
 
-![](../static/images/Dataupdate_3.png)
+![partial-column-update-2](../static/images/Dataupdate_3.png)
 
 ## Order of Data Updates
 
diff --git a/blog/Moka.md 
b/blog/Listen-to-That-Poor-BI-Engineer-We-Need-Fast-Joins.md
similarity index 95%
rename from blog/Moka.md
rename to blog/Listen-to-That-Poor-BI-Engineer-We-Need-Fast-Joins.md
index 08d82ceeea2..7d43fb4eab4 100644
--- a/blog/Moka.md
+++ b/blog/Listen-to-That-Poor-BI-Engineer-We-Need-Fast-Joins.md
@@ -35,13 +35,13 @@ Business intelligence (BI) tool is often the last stop of a 
data processing pipe
 
 I work as an engineer that supports a human resource management system. One 
prominent selling point of our services is **self-service** **BI**. That means 
we allow users to customize their own dashboards: they can choose the fields 
they need and relate them to form the dataset as they want. 
 
-![](../static/images/Moka_1.png)
+![self-service-BI](../static/images/Moka_1.png)
 
 Join query is a more efficient way to realize self-service BI. It allows 
people to break down their data assets into many smaller tables instead of 
putting it all in a flat table. This would make data updates much faster and 
more cost-effective, because updating the whole flat table is not always the 
optimal choice when you have plenty of new data flowing in and old data being 
updated or deleted frequently, as is the case for most data input.
 
 In order to maximize the time value of data, we need data updates to be 
executed really quickly. For this purpose, we looked into three OLAP databases 
on the market. They are all fast in some way but there are some differences.
 
-![](../static/images/Moka_2.png)
+![Apache-Doris-VS-ClickHouse-VS-Greenplum](../static/images/Moka_2.png)
 
 Greenplum is really quick in data loading and batch DML processing, but it is 
not good at handling high concurrency. There is a steep decline in performance 
as query concurrency rises. This can be risky for a BI platform that tries to 
ensure stable user experience. ClickHouse is mind-blowing in single-table 
queries, but it only allows batch update and batch delete, so that's less 
timely.
 
@@ -51,7 +51,7 @@ JOIN, my old friend JOIN, is always a hassle. Join queries 
are demanding for bot
 
 We tested our candidate OLAP engines with our common join queries and our most 
notorious slow queries. 
 
-![](../static/images/Moka_3.png)
+![Apache-Doris-VS-ClickHouse](../static/images/Moka_3.png)
 
 As the number of tables joined grows, we witness a widening performance gap 
between Apache Doris and ClickHouse. In most join queries, Apache Doris was 
about 5 times faster than ClickHouse. In terms of slow queries, Apache Doris 
responded to most of them within less than 1 second, while the performance of 
ClickHouse fluctuated within a relatively large range. 
 
@@ -79,15 +79,15 @@ Human resource data is subject to very strict and 
fine-grained access control po
 
 How does all this add to complexity in engineering? Any user who inputs a 
query on our BI platform must go through multi-factor authentication, and the 
authenticated information will all be inserted into the SQL via `in` and then 
passed on to the OLAP engine. Therefore, the more fine-grained the privilege 
controls are, the longer the SQL will be, and the more time the OLAP system 
will spend on ID filtering. That's why our users are often tortured by high 
latency.
 
-![](../static/images/Moka_4.png)
+![privileged-access-queries](../static/images/Moka_4.png)
 
 So how did we fix that? We use the [Bloom Filter 
index](https://doris.apache.org/docs/dev/data-table/index/bloomfilter/) in 
Apache Doris. 
 
-![](../static/images/Moka_5.png)
+![BloomFilter-index](../static/images/Moka_5.png)
 
 By adding Bloom Filter indexes to the relevant ID fields, we improve the speed 
of privileged queries by 30% and basically eliminate timeout errors.
 
-![](../static/images/Moka_6.png)
+![faster-privileged-access-queries](../static/images/Moka_6.png)
 
 Tips on when you should use the Bloom Filter index:
 
diff --git 
a/blog/Log-Analysis-How-to-Digest-15-Billion-Logs-Per-Day-and-Keep-Big-Queries-Within-1-Second.md
 
b/blog/Log-Analysis-How-to-Digest-15-Billion-Logs-Per-Day-and-Keep-Big-Queries-Within-1-Second.md
new file mode 100644
index 00000000000..cd47c6a3f91
--- /dev/null
+++ 
b/blog/Log-Analysis-How-to-Digest-15-Billion-Logs-Per-Day-and-Keep-Big-Queries-Within-1-Second.md
@@ -0,0 +1,94 @@
+---
+{
+    'title': 'Log Analysis: How to Digest 15 Billion Logs Per Day and Keep Big 
Queries Within 1 Second',
+    'summary': "This article describes a large-scale data warehousing use case 
to provide reference for data engineers who are looking for log analytic 
solutions. It introduces the log processing architecture and real case practice 
in data ingestion, storage, and queries.",
+    'date': '2023-09-16',
+    'author': 'Yuqi Liu',
+    'tags': ['Best Practice'],
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+
+
+This data warehousing use case is about **scale**. The user is [China 
Unicom](https://en.wikipedia.org/wiki/China_Unicom), one of the world's biggest 
telecommunication service providers. Using Apache Doris, they deploy multiple 
petabyte-scale clusters on dozens of machines to support their 15 billion daily 
log additions from their over 30 business lines. Such a gigantic log analysis 
system is part of their cybersecurity management. For the need of real-time 
monitoring, threat tracing, an [...]
+
+From an architectural perspective, the system should be able to undertake 
real-time analysis of various formats of logs, and of course, be scalable to 
support the huge and ever-enlarging data size. The rest of this post is about 
what their log processing architecture looks like, and how they realize stable 
data ingestion, low-cost storage, and quick queries with it.
+
+## System Architecture
+
+This is an overview of their data pipeline. The logs are collected into the 
data warehouse, and go through several layers of processing.
+
+![real-time-data-warehouse-2.0](../static/images/Unicom-1.png)
+
+- **ODS**: Original logs and alerts from all sources are gathered into Apache 
Kafka. Meanwhile, a copy of them will be stored in HDFS for data verification 
or replay.
+- **DWD**: This is where the fact tables are. Apache Flink cleans, 
standardizes, backfills, and de-identifies the data, and write it back to 
Kafka. These fact tables will also be put into Apache Doris, so that Doris can 
trace a certain item or use them for dashboarding and reporting. As logs are 
not averse to duplication, the fact tables will be arranged in the [Duplicate 
Key 
model](https://doris.apache.org/docs/dev/data-table/data-model#duplicate-model) 
of Apache Doris.  
+- **DWS**: This layer aggregates data from DWD and lays the foundation for 
queries and analysis.
+- **ADS**: In this layer, Apache Doris auto-aggregates data with its Aggregate 
Key model, and auto-updates data with its Unique Key model. 
+
+Architecture 2.0 evolves from Architecture 1.0, which is supported by 
ClickHouse and Apache Hive. The transition arised from the user's needs for 
real-time data processing and multi-table join queries. In their experience 
with ClickHouse, they found inadequate support for concurrency and multi-table 
joins, manifested by frequent timeouts in dashboarding and OOM errors in 
distributed joins.
+
+![real-time-data-warehouse-1.0](../static/images/Unicom-2.png)
+
+Now let's take a look at their practice in data ingestion, storage, and 
queries with Architecture 2.0.
+
+## Real-Case Practice
+
+### Stable ingestion of 15 billion logs per day
+
+In the user's case, their business churns out 15 billion logs every day. 
Ingesting such data volume quickly and stably is a real problem. With Apache 
Doris, the recommended way is to use the Flink-Doris-Connector. It is developed 
by the Apache Doris community for large-scale data writing. The component 
requires simple configuration. It implements Stream Load and can reach a 
writing speed of 200,000~300,000 logs per second, without interrupting the data 
analytic workloads.
+
+A lesson learned is that when using Flink for high-frequency writing, you need 
to find the right parameter configuration for your case to avoid data version 
accumulation. In this case, the user made the following optimizations:
+
+- **Flink Checkpoint**: They increase the checkpoint interval from 15s to 60s 
to reduce writing frequency and the number of transactions processed by Doris 
per unit of time. This can relieve data writing pressure and avoid generating 
too many data versions.
+- **Data Pre-Aggregation**: For data of the same ID but comes from various 
tables, Flink will pre-aggregate it based on the primary key ID and create a 
flat table, in order to avoid excessive resource consumption caused by 
multi-source data writing.
+- **Doris Compaction**: The trick here includes finding the right Doris 
backend (BE) parameters to allocate the right amount of CPU resources for data 
compaction, setting the appropriate number of data partitions, buckets, and 
replicas (too much data tablets will bring huge overheads), and dialing up 
`max_tablet_version_num` to avoid version accumulation.
+
+These measures together ensure daily ingestion stability. The user has 
witnessed stable performance and low compaction score in Doris backend. In 
addition, the combination of data pre-processing in Flink and the [Unique Key 
model](https://doris.apache.org/docs/dev/data-table/data-model#unique-model) in 
Doris can ensure quicker data updates.
+
+### Storage strategies to reduce costs by 50%
+
+The size and generation rate of logs also impose pressure on storage. Among 
the immense log data, only a part of it is of high informational value, so 
storage should be differentiated. The user has three storage strategies to 
reduce costs. 
+
+- **ZSTD (ZStandard) compression algorithm**: For tables larger than 1TB, 
specify the compression method as "ZSTD" upon table creation, it will realize a 
compression ratio of 10:1. 
+- **Tiered storage of hot and cold data**: This is supported by the [new 
feature](https://blog.devgenius.io/hot-cold-data-separation-what-why-and-how-5f7c73e7a3cf)
 of Doris. The user sets a data "cooldown" period of 7 days. That means data 
from the past 7 days (namely, hot data) will be stored in SSD. As time goes by, 
hot data "cools down" (getting older than 7 days), it will be automatically 
moved to HDD, which is less expensive. As data gets even "colder", it will be 
moved to object st [...]
+- **Differentiated replica numbers for different data partitions**: The user 
has partitioned their data by time range. The principle is to have more 
replicas for newer data partitions and less for the older ones. In their case, 
data from the past 3 months is frequently accessed, so they have 2 replicas for 
this partition. Data that is 3~6 months old has two replicas, and data from 6 
months ago has one single copy. 
+
+With these three strategies, the user has reduced their storage costs by 50%.
+
+### Differentiated query strategies based on data size
+
+Some logs must be immediately traced and located, such as those of abnormal 
events or failures. To ensure real-time response to these queries, the user has 
different query strategies for different data sizes:
+
+- **Less than 100G**: The user utilizes the dynamic partitioning feature of 
Doris. Small tables will be partitioned by date and large tables will be 
partitioned by hour. This can avoid data skew. To further ensure balance of 
data within a partition, they use the snowflake ID as the bucketing field. They 
also set a starting offset of 20 days, which means data of the recent 20 days 
will be kept. In this way, they find the balance point between data backlog and 
analytic needs.
+- **100G~1T**: These tables have their materialized views, which are the 
pre-computed result sets stored in Doris. Thus, queries on these tables will be 
much faster and less resource-consuming. The DDL syntax of materialized views 
in Doris is the same as those in PostgreSQL and Oracle.
+- **More than 100T**: These tables are put into the Aggregate Key model of 
Apache Doris and pre-aggregate them. **In this way, we enable queries of 2 
billion log records to be done in 1~2s.** 
+
+These strategies have shortened the response time of queries. For example, a 
query of a specific data item used to take minutes, but now it can be finished 
in milliseconds. In addition, for big tables that contain 10 billion data 
records, queries on different dimensions can all be done in a few seconds.
+
+## Ongoing Plans
+
+The user is now testing with the newly added [inverted 
index](https://doris.apache.org/docs/dev/data-table/index/inverted-index?_highlight=inverted)
 in Apache Doris. It is designed to speed up full-text search of strings as 
well as equivalence and range queries of numerics and datetime. They have also 
provided their valuable feedback about the auto-bucketing logic in Doris: 
Currently, Doris decides the number of buckets for a partition  based on the 
data size of the previous partition. T [...]
+
+
+
+
+
diff --git a/blog/Tianyancha.md 
b/blog/Replacing-Apache-Hive-Elasticsearch-and-PostgreSQL-with-Apache-Doris.md
similarity index 90%
rename from blog/Tianyancha.md
rename to 
blog/Replacing-Apache-Hive-Elasticsearch-and-PostgreSQL-with-Apache-Doris.md
index e2c1678725b..2d07e649cfa 100644
--- a/blog/Tianyancha.md
+++ 
b/blog/Replacing-Apache-Hive-Elasticsearch-and-PostgreSQL-with-Apache-Doris.md
@@ -37,15 +37,15 @@ Our old data warehouse consisted of the most popular 
components of the time, inc
 
 As you can imagine, a long and complicated data pipeline is high-maintenance 
and detrimental to development efficiency. Moreover, they are not capable of 
ad-hoc queries. So as an upgrade to our data warehouse, we replaced most of 
these components with [Apache Doris](https://github.com/apache/doris), a 
unified analytic database.
 
-![](../static/images/Tianyancha_1.png)
+![replace-MySQL-Elasticsearch-PostgreSQL-with-Apache-Doris-before](../static/images/Tianyancha_1.png)
 
-![](../static/images/Tianyancha_2.png)
+![replace-MySQL-Elasticsearch-PostgreSQL-with-Apache-Doris-after](../static/images/Tianyancha_2.png)
 
 ## Data Flow
 
 This is a lateral view of our data warehouse, from which you can see how the 
data flows.
 
-![](../static/images/Tianyancha_3.png)
+![data-flow](../static/images/Tianyancha_3.png)
 
 For starters, binlogs from MySQL will be ingested into Kafka via Canal, while 
user activity logs will be transferred to Kafka via Apache Flume. In Kafka, 
data will be cleaned and organized into flat tables, which will be later turned 
into aggregated tables. Then, data will be passed from Kafka to Apache Doris, 
which serves as the storage and computing engine. 
 
@@ -59,7 +59,7 @@ This is how Apache Doris replaces the roles of Hive, 
Elasticsearch, and PostgreS
 
 **After**: Since Apache Doris has all the itemized data, whenever it is faced 
with a new request, it can simply pull the metadata and configure the query 
conditions. Then it is ready for ad-hoc queries. In short, it only requires 
low-code configuration to respond to new requests. 
 
-![](../static/images/Tianyancha_4.png)
+![ad-hoc-queries](../static/images/Tianyancha_4.png)
 
 ## User Segmentation
 
@@ -71,7 +71,7 @@ Tables in Elasticsearch and PostgreSQL were unreusable, 
making this architecture
 
 In this Doris-centered user segmentation process, we don't have to pre-define 
new tags. Instead, tags can be auto-generated based on the task conditions. The 
processing pipeline has the flexibility that can make our user-group-based A/B 
testing easier. Also, as both the itemized data and user group packets are in 
Apache Doris, we don't have to attend to the read and write complexity between 
multiple components.
 
-![](../static/images/Tianyancha_5.png)
+![user-segmentation-pipeline](../static/images/Tianyancha_5.png)
 
 ## Trick to Speed up User Segmentation by 70%
 
@@ -79,9 +79,9 @@ Due to risk aversion reasons, random generation of `user_id` 
is the choice for m
 
 To solve that, we created consecutive and dense mappings for these user IDs. 
**In this way, we decreased our user segmentation latency by 70%.**
 
-![](../static/images/Tianyancha_6.png)
+![user-segmentation-latency-1](../static/images/Tianyancha_6.png)
 
-![](../static/images/Tianyancha_7.png)
+![user-segmentation-latency-2](../static/images/Tianyancha_7.png)
 
 ### Example
 
@@ -89,13 +89,13 @@ To solve that, we created consecutive and dense mappings 
for these user IDs. **I
 
 We adopt the Unique model for user ID mapping tables, where the user ID is the 
unique key. The mapped consecutive IDs usually start from 1 and are strictly 
increasing. 
 
-![](../static/images/Tianyancha_8.png)
+![create-user-ID-mapping-table](../static/images/Tianyancha_8.png)
 
 **Step 2: Create a user group table:**
 
 We adopt the Aggregate model for user group tables, where user tags serve as 
the aggregation keys. 
 
-![](../static/images/Tianyancha_9.png)
+![create-user-group-table](../static/images/Tianyancha_9.png)
 
 Supposing that we need to pick out the users whose IDs are between 0 and 
2000000. 
 
@@ -104,13 +104,13 @@ The following snippets use non-consecutive 
(`tyc_user_id`) and consecutive (`tyc
 - Non-Consecutive User IDs: **1843ms**
 - Consecutive User IDs: **543ms** 
 
-![](../static/images/Tianyancha_10.png)
+![response-time-of-consecutive-and-non-consecutive-user-IDs](../static/images/Tianyancha_10.png)
 
 ## Conclusion
 
 We have 2 clusters in Apache Doris accommodating tens of TBs of data, with 
almost a billion new rows flowing in every day. We used to witness a steep 
decline in data ingestion speed as data volume expanded. But after upgrading 
our data warehouse with Apache Doris, we increased our data writing efficiency 
by 75%. Also, in user segmentation with a result set of less than 5 million, it 
is able to respond within milliseconds. Most importantly, our data warehouse 
has been simpler and friendli [...]
 
-![](../static/images/Tianyancha_11.png)
+![user-segmentation-latency-3](../static/images/Tianyancha_11.png)
 
 Lastly, I would like to share with you something that interested us most when 
we first talked to the [Apache Doris community](https://t.co/KcxAtAJZjZ):
 
@@ -118,3 +118,5 @@ Lastly, I would like to share with you something that 
interested us most when we
 - It is well-integrated with the data ecosystem and can smoothly interface 
with most data sources and data formats.
 - It allows us to implement elastic scaling of clusters using the command line 
interface.
 - It outperforms ClickHouse in **join queries**.
+
+Find Apache Doris developers on 
[Slack](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-1t3wfymur-0soNPATWQ~gbU8xutFOLog)
diff --git a/blog/Memory_Management.md b/blog/Say-Goodbye-to-OOM-Crashes.md
similarity index 97%
rename from blog/Memory_Management.md
rename to blog/Say-Goodbye-to-OOM-Crashes.md
index b658ea278b6..d7ec381b453 100644
--- a/blog/Memory_Management.md
+++ b/blog/Say-Goodbye-to-OOM-Crashes.md
@@ -31,7 +31,7 @@ under the License.
 
 What guarantees system stability in large data query tasks? It is an effective 
memory allocation and monitoring mechanism. It is how you speed up computation, 
avoid memory hotspots, promptly respond to insufficient memory, and minimize 
OOM errors. 
 
-![](../static/images/OOM_1.png)
+![memory-allocator](../static/images/OOM_1.png)
 
 From a database user's perspective, how do they suffer from bad memory 
management? This is a list of things that used to bother our users:
 
@@ -47,13 +47,13 @@ Luckily, those dark days are behind us, because we have 
improved our memory mana
 
 In Apache Doris, we have a one-and-only interface for memory allocation: 
**Allocator**. It will make adjustments as it sees appropriate to keep memory 
usage efficient and under control. Also, MemTrackers are in place to track the 
allocated or released memory size, and three different data structures are 
responsible for large memory allocation in operator execution (we will get to 
them immediately).   
 
-![](../static/images/OOM_2.png)
+![memory-tracker](../static/images/OOM_2.png)
 
 ### Data Structures in Memory
 
 As different queries have different memory hotspot patterns in execution, 
Apache Doris provides three different in-memory data structures: **Arena**, 
**HashTable**, and **PODArray**. They are all under the reign of the Allocator.
 
-![](../static/images/OOM_3.png)
+![data-structures](../static/images/OOM_3.png)
 
 **1. Arena**
 
@@ -99,7 +99,7 @@ Memory reuse is executed in data scanning, too. Before the 
scanning starts, a nu
 
 The MemTracker system before Apache Doris 1.2.0 was in a hierarchical tree 
structure, consisting of process_mem_tracker, query_pool_mem_tracker, 
query_mem_tracker, instance_mem_tracker, ExecNode_mem_tracker and so on. 
MemTrackers of two neighbouring layers are of parent-child relationship. Hence, 
any calculation mistakes in a child MemTracker will be accumulated all the way 
up and result in a larger scale of incredibility. 
 
-![](../static/images/OOM_4.png)
+![MemTrackers](../static/images/OOM_4.png)
 
 In Apache Doris 1.2.0 and newer, we made the structure of MemTrackers much 
simpler. MemTrackers are only divided into two types based on their roles: 
**MemTracker Limiter** and the others. MemTracker Limiter, monitoring memory 
usage, is unique in every query/ingestion/compaction task and global object; 
while the other MemTrackers traces the memory hotspots in query execution, such 
as HashTables in Join/Aggregation/Sort/Window functions and intermediate data 
in serialization, to give a pi [...]
 
@@ -107,7 +107,7 @@ The parent-child relationship between MemTracker Limiter 
and other MemTrackers i
 
 MemTrackers (including MemTracker Limiter and the others) are put into a group 
of Maps. They allow users to print overall MemTracker type snapshot, 
Query/Load/Compaction task snapshot, and find out the Query/Load with the most 
memory usage or the most memory overusage. 
 
-![](../static/images/OOM_5.png)
+![Structure-of-MemTrackers](../static/images/OOM_5.png)
 
 ### How MemTracker Works
 
@@ -122,21 +122,21 @@ Now let me explain with a simplified query execution 
process.
 - When the scanning is done, all MemTrackers in the Scanner Thread TLS Stack 
will be removed. When the ScanNode scheduling is done, the ScanNode MemTracker 
will be removed from the fragment execution thread. Then, similarly, when an 
aggregation node is scheduled, an **AggregationNode MemTracker** will be added 
to the fragment execution thread TLS Stack, and get removed after the 
scheduling is done.
 - If the query is completed, the Query MemTracker will be removed from the 
fragment execution thread TLS Stack. At this point, this stack should be empty. 
Then, from the QueryProfile, you can view the peak memory usage during the 
whole query execution as well as each phase (scanning, aggregation, etc.).
 
-![](../static/images/OOM_6.png)
+![How-MemTrackers-Works](../static/images/OOM_6.png)
 
 ### How to Use MemTracker
 
 The Doris backend Web page demonstrates real-time memory usage, which is 
divided into types: Query/Load/Compaction/Global. Current memory consumption 
and peak consumption are shown. 
 
-![](../static/images/OOM_7.png)
+![How-to-use-MemTrackers](../static/images/OOM_7.png)
 
 The Global types include MemTrackers of Cache and TabletMeta.
 
-![](../static/images/OOM_8.png)
+![memory-usage-by-subsystem-1](../static/images/OOM_8.png)
 
 From the Query types, you can see the current memory consumption and peak 
consumption of the current query and the operators it involves (you can tell 
how they are related from the labels). For memory statistics of historical 
queries, you can check the Doris FE audit logs or BE INFO logs.
 
-![](../static/images/OOM_9.png)
+![memory-usage-by-subsystem-2](../static/images/OOM_9.png)
 
 ## Memory Limit
 
@@ -152,7 +152,7 @@ While in Apache Doris 2.0, we have realized exception 
safety for queries. That m
 
 On a regular basis, Doris backend retrieves the physical memory of processes 
and the currently available memory size from the system. Meanwhile, it collects 
MemTracker snapshots of all Query/Load/Compaction tasks. If a backend process 
exceeds its memory limit or there is insufficient memory, Doris will free up 
some memory space by clearing Cache and cancelling a number of queries or data 
ingestion tasks. These will be executed by an individual GC thread regularly.
 
-![](../static/images/OOM_10.png)
+![memory-limit-on-process](../static/images/OOM_10.png)
 
 If the process memory consumed is over the SoftMemLimit (81% of total system 
memory, by default), or the available system memory drops below the Warning 
Water Mark (less than 3.2GB), **Minor GC** will be triggered. At this moment, 
query execution will be paused at the memory allocation step, the cached data 
in data ingestion tasks will be force flushed, and part of the Data Page Cache 
and the outdated Segment Cache will be released. If the newly released memory 
does not cover 10% of the  [...]
 
diff --git a/blog/HYXJ.md 
b/blog/Step-by-step-Guide-to-Building-a-High-Performing-Risk-Data-Mart.md
similarity index 96%
rename from blog/HYXJ.md
rename to 
blog/Step-by-step-Guide-to-Building-a-High-Performing-Risk-Data-Mart.md
index 97aad2303ca..8aa15b79981 100644
--- a/blog/HYXJ.md
+++ b/blog/Step-by-step-Guide-to-Building-a-High-Performing-Risk-Data-Mart.md
@@ -39,7 +39,7 @@ I will walk you through how the risk data mart works 
following the data flow:
 
 So these are the three data sources of our risk data mart.
 
-![](../static/images/RDM_1.png)
+![risk-data-mart](../static/images/RDM_1.png)
 
 This whole architecture is built with CDH 6.0. The workflows in it can be 
divided into real-time data streaming and offline risk analysis.
 
@@ -48,7 +48,7 @@ This whole architecture is built with CDH 6.0. The workflows 
in it can be divide
 
 To give a brief overview, these are the components that support the four 
features of our data processing platform:
 
-![](../static/images/RDM_2.png)
+![features-of-a-data-processing-platform](../static/images/RDM_2.png)
 
 As you see, Apache Hive is central to this architecture. But in practice, it 
takes minutes for Apache Hive to execute analysis, so our next step is to 
increase query speed.
 
@@ -72,7 +72,7 @@ In addition, since our risk control analysts and modeling 
engineers are using Hi
 
 We wanted a unified gateway to manage our heterogenous data sources. That's 
why we introduced Apache Doris.
 
-![](../static/images/RDM_3.png)
+![unified-query-gateway](../static/images/RDM_3.png)
 
 But doesn't that make things even more complicated? Actually, no.
 
@@ -80,7 +80,7 @@ We can connect various data sources to Apache Doris and 
simply conduct queries o
 
 We create Elasticsearch Catalog and Hive Catalog in Apache Doris. These 
catalogs map to the external data in Elasticsearch and Hive, so we can conduct 
federated queries across these data sources using Apache Doris as a unified 
gateway. Also, we use the 
[Spark-Doris-Connector](https://github.com/apache/doris-spark-connector) to 
allow data communication between Spark and Doris. So basically, we replace 
Apache Hive with Apache Doris as the central hub of our data architecture. 
 
-![](../static/images/RDM_4.png)
+![Apache-Doris-as-center-of-data-architecture](../static/images/RDM_4.png)
 
 How does that affect our data processing efficiency?
 
@@ -101,7 +101,7 @@ Backup cluster: 4 frontends + 4 backends, m5d.16xlarge
 
 This is the monitoring board: 
 
-![](../static/images/RDM_5.png)
+![cluster-monitoring-board](../static/images/RDM_5.png)
 
 As is shown, the queries are fast. We expected that it would take at least 10 
nodes but in real cases, we mainly conduct queries via Catalogs, so we can 
handle this with a relatively small cluster size. The compatibility is good, 
too. It doesn't rock the rest of our existing system.
 
@@ -109,7 +109,7 @@ As is shown, the queries are fast. We expected that it 
would take at least 10 no
 
 To accelerate the regular data ingestion from Hive to Apache Doris 1.2.2, we 
have a solution that goes as follows:
 
-![](../static/images/RDM_6.png)
+![faster-data-integration](../static/images/RDM_6.png)
 
 **Main components:**
 
@@ -197,7 +197,7 @@ group by product_no;
 
 **After**: For data synchronization, we call Spark on YARN using SeaTunnel. It 
can be finished within 11 minutes (100 million pieces per minute ), and the 
ingested data only takes up **378G** of storage space.
 
-![](../static/images/RDM_7.png)
+![faster-data-ingestion](../static/images/RDM_7.png)
 
 ## Summary
 
diff --git a/blog/Tencent Music.md 
b/blog/Tencent-Data-Engineers-Why-We-Went-from-ClickHouse-to-Apache-Doris.md
similarity index 93%
rename from blog/Tencent Music.md
rename to 
blog/Tencent-Data-Engineers-Why-We-Went-from-ClickHouse-to-Apache-Doris.md
index bf649e35a17..8f5aae89c27 100644
--- a/blog/Tencent Music.md     
+++ b/blog/Tencent-Data-Engineers-Why-We-Went-from-ClickHouse-to-Apache-Doris.md
@@ -1,6 +1,6 @@
 ---
 {
-    'title': 'Tencent Data Engineer: Why We Go from ClickHouse to Apache 
Doris?',
+    'title': 'Tencent Data Engineer: Why We G from ClickHouse to Apache 
Doris?',
     'summary': "Evolution of the data processing architecture of Tencent Music 
Entertainment towards better performance and simpler maintenance.",
     'date': '2023-03-07',
     'author': 'Jun Zhang & Kai Dai',
@@ -27,9 +27,9 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-![](../static/images/TME/TME.png)
+![Tencent-use-case-of-Apache-Doris](../static/images/TME/TME.png)
 
-This article is co-written by me and my colleague Kai Dai. We are both data 
platform engineers at Tencent Music (NYSE: TME), a music streaming service 
provider with a whopping 800 million monthly active users. To drop the number 
here is not to brag but to give a hint of the sea of data that my poor 
coworkers and I have to deal with everyday.
+This article is co-written by me and my colleague Kai Dai. We are both data 
platform engineers at [Tencent Music](https://www.tencentmusic.com/en-us/) 
(NYSE: TME), a music streaming service provider with a whopping 800 million 
monthly active users. To drop the number here is not to brag but to give a hint 
of the sea of data that my poor coworkers and I have to deal with everyday.
 
 # What We Use ClickHouse For?
 
@@ -37,7 +37,7 @@ The music library of Tencent Music contains data of all forms 
and types: recorde
 
 Specifically, we do all-round analysis of the songs, lyrics, melodies, albums, 
and artists, turn all this information into data assets, and pass them to our 
internal data users for inventory counting, user profiling, metrics analysis, 
and group targeting.
 
-![](../static/images/TME/TME_1.png)
+![data-pipeline](../static/images/TME/TME_1.png)
 
 We stored and processed most of our data in Tencent Data Warehouse (TDW), an 
offline data platform where we put the data into various tag and metric systems 
and then created flat tables centering each object (songs, artists, etc.).
 
@@ -47,7 +47,7 @@ After that, our data analysts used the data under the tags 
and metrics they need
 
 The data processing pipeline looked like this:
 
-![](../static/images/TME/TME_2.png)
+![data-warehouse-architecture-1.0](../static/images/TME/TME_2.png)
 
 # The Problems with ClickHouse
 
@@ -71,7 +71,7 @@ Statistically speaking, these features have cut our storage 
cost by 42% and deve
 
 During our usage of Doris, we have received lots of support from the open 
source Apache Doris community and timely help from the SelectDB team, which is 
now running a commercial version of Apache Doris.
 
-![](../static/images/TME/TME_3.png)
+![data-warehouse-architecture-2.0](../static/images/TME/TME_3.png)
 
 # Further Improvement to Serve Our Needs
 
@@ -81,7 +81,7 @@ Speaking of the datasets, on the bright side, our data 
analysts are given the li
 
 Our solution is to introduce a semantic layer in our data processing pipeline. 
The semantic layer is where all the technical terms are translated into more 
comprehensible concepts for our internal data users. In other words, we are 
turning the tags and metrics into first-class citizens for data definement and 
management.
 
-![](../static/images/TME/TME_4.png)
+![data-warehouse-architecture-3.0](../static/images/TME/TME_4.png)
 
 **Why would this help?**
 
@@ -95,7 +95,7 @@ Explicitly defining the tags and metrics at the semantic 
layer was not enough. I
 
 For this sake, we made the semantic layer the heart of our data management 
system:
 
-![](../static/images/TME/TME_5.png)
+![data-warehouse-architecture-4.0](../static/images/TME/TME_5.png)
 
 **How does it work?**
 
@@ -111,7 +111,7 @@ As you can see, Apache Doris has played a pivotal role in 
our solution. Optimizi
 
 ## What We Want?
 
-![](../static/images/TME/TME_6.png)
+![goals-of-a-data-analytic-solution](../static/images/TME/TME_6.png)
 
 Currently, we have 800+ tags and 1300+ metrics derived from the 80+ source 
tables in TDW.
 
@@ -126,7 +126,7 @@ When importing data from TDW to Doris, we hope to achieve:
 
 1. **Generate Flat Tables in Flink Instead of TDW**
 
-![](../static/images/TME/TME_7.png)
+![generate-flat-tables-in-Flink](../static/images/TME/TME_7.png)
 
 Generating flat tables in TDW has a few downsides:
 
@@ -143,7 +143,7 @@ On the contrary, generating flat tables in Doris is much 
easier and less expensi
 
 As is shown below, Flink has aggregated the five lines of data, of which 
“ID”=1, into one line in Doris, reducing the data writing pressure on Doris.
 
-![](../static/images/TME/TME_8.png)
+![flat-tables-in-Flink-2](../static/images/TME/TME_8.png)
 
 This can largely reduce storage costs since TDW no long has to maintain two 
copies of data and KafKa only needs to store the new data pending for 
ingestion. What’s more, we can add whatever ETL logic we want into Flink and 
reuse lots of development logic for offline and real-time data ingestion.
 
@@ -184,7 +184,7 @@ max_cumulative_compaction_num_singleton_deltas
 
 - Optimization of the BE commit logic: conduct regular caching of BE lists, 
commit them to the BE nodes batch by batch, and use finer load balancing 
granularity.
 
-![](../static/images/TME/TME_9.png)
+![stable-compaction-score](../static/images/TME/TME_9.png)
 
 **4. Use Dori-on-ES in Queries**
 
@@ -214,7 +214,7 @@ I. When Doris BE pulls data from Elasticsearch (1024 lines 
at a time by default)
 
 II. After the data pulling, Doris BE needs to conduct Join operations with 
local metric tables via SHUFFLE/BROADCAST, which can cost a lot.
 
-![](../static/images/TME/TME_10.png)
+![Doris-on-Elasticsearch](../static/images/TME/TME_10.png)
 
 Thus, we make the following optimizations:
 
@@ -224,7 +224,7 @@ Thus, we make the following optimizations:
 - Use ES to compress the queried data; turn multiple data fetch into one and 
reduce network I/O overhead.
 - Make sure that Doris BE only pulls the data of buckets related to the local 
metric tables and conducts local Join operations directly to avoid data 
shuffling between Doris BEs.
 
-![](../static/images/TME/TME_11.png)
+![Doris-on-Elasticsearch-2](../static/images/TME/TME_11.png)
 
 As a result, we reduce the query response time for large group targeting from 
60 seconds to a surprising 3.7 seconds.
 
@@ -257,3 +257,5 @@ http://doris.apache.org
 **Apache Doris Github**:
 
 https://github.com/apache/doris
+
+Find Apache Doris developers on 
[Slack](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-1t3wfymur-0soNPATWQ~gbU8xutFOLog)
diff --git a/blog/Tencent-LLM.md b/blog/Tencent-LLM.md
index 9e7c0fba8e6..3775c5a82d0 100644
--- a/blog/Tencent-LLM.md
+++ b/blog/Tencent-LLM.md
@@ -36,7 +36,7 @@ We have adopted Large Language Models (LLM) to empower our 
Doris-based OLAP serv
 
 Our incentive was to save our internal staff from the steep learning curve of 
SQL writing. Thus, we used LLM as an intermediate. It transforms natural 
language questions into SQL statements and sends the SQLs to the OLAP engine 
for execution.
 
-![](../static/images/Tencent_LLM_1.png)
+![LLM-OLAP-solution](../static/images/Tencent_LLM_1.png)
 
 Like every AI-related experience, we came across some friction:
 
@@ -53,7 +53,7 @@ For problem No.1, we introduce a semantic layer between the 
LLM and the OLAP eng
 
 Besides that, the semantic layer can optimize the computation logic. When 
analysts input a question that involves a complicated query, let's say, a 
multi-table join, the semantic layer can split that into multiple single-table 
queries to reduce semantic distortion.
 
-![](../static/images/Tencent_LLM_2.png)
+![LLM-OLAP-semantic-layer](../static/images/Tencent_LLM_2.png)
 
 ### 2. LLM parsing rules
 
@@ -61,7 +61,7 @@ To increase cost-effectiveness in using LLM, we evaluate the 
computation complex
 
 For example, when an analyst inputs "tell me the earnings of the major musical 
platforms", the LLM identifies that this question only entails several metrics 
or dimensions, so it will not further parse it but send it straight for SQL 
generation and execution. This can largely shorten query response time and 
reduce API expenses. 
 
-![](../static/images/Tencent_LLM_3.png)
+![LLM-OLAP-parsing-rules](../static/images/Tencent_LLM_3.png)
 
 ### 3. Schema Mapper and external knowledge base
 
@@ -69,7 +69,7 @@ To empower the LLM with niche knowledge, we added a Schema 
Mapper upstream from
 
 We are constantly testing and optimizing the Schema Mapper. We categorize and 
rate content in the external knowledge base, and do various levels of mapping 
(full-text mapping and fuzzy mapping) to enable better semantic parsing.
 
-![](../static/images/Tencent_LLM_4.png)
+![LLM-OLAP-schema-mapper](../static/images/Tencent_LLM_4.png)
 
 ### 4. Plugins
 
@@ -78,7 +78,7 @@ We used plugins to connect the LLM to more fields of 
information, and we have di
 - **Embedding local files**: This is especially useful when we need to "teach" 
the LLM the latest regulatory policies, which are often text files. Firstly, 
the system vectorizes the local text file, executes semantic searches to find 
matching or similar terms in the local file, extracts the relevant contents and 
puts them into the LLM parsing window to generate output. 
 - **Third-party plugins**: The marketplace is full of third-party plugins that 
are designed for all kinds of sectors. With them, the LLM is able to deal with 
wide-ranging topics. Each plugin has its own prompts and calling function. Once 
the input question hits a prompt, the relevant plugin will be called.
 
-![](../static/images/Tencent_LLM_5.png)
+![LLM-OLAP-plugins](../static/images/Tencent_LLM_5.png)
 
 After we are done with above four optimizations, the SuperSonic framework 
comes into being.
 
@@ -86,7 +86,7 @@ After we are done with above four optimizations, the 
SuperSonic framework comes
 
 Now let me walk you through this 
[framework](https://github.com/tencentmusic/supersonic):
 
-![](../static/images/Tencent_LLM_6.png)
+![LLM-OLAP-supersonic-framework](../static/images/Tencent_LLM_6.png)
 
 - An analyst inputs a question.
 - The Schema Mapper maps the question to an external knowledge base.
@@ -97,7 +97,7 @@ Now let me walk you through this 
[framework](https://github.com/tencentmusic/sup
 
 **Example**
 
-![](../static/images/Tencent_LLM_7.png)
+![LLM-OLAP-chatbot-query-interface](../static/images/Tencent_LLM_7.png)
 
 To answer whether a certain song can be performed on variety shows, the system 
retrieves the OLAP data warehouse for details about the song, and presents it 
with results from the Commercial Use Query third-party plugin.
 
@@ -107,7 +107,7 @@ As for the OLAP part of this framework, after several 
rounds of architectural ev
 
 Raw data is sorted into tags and metrics, which are custom-defined by the 
analysts. The tags and metrics are under unified management in order to avoid 
inconsistent definitions. Then, they are combined into various tagsets and 
metricsets for various queries. 
 
-![](../static/images/Tencent_LLM_8.png)
+![LLM-OLAP-architecture](../static/images/Tencent_LLM_8.png)
 
 We have drawn two main takeaways for you from our architectural optimization 
experience.
 
@@ -146,4 +146,4 @@ When the number of aggregation tasks or data volume becomes 
overwhelming for Fli
 
 ## What's Next
 
-With an aim to reduce costs and increase service availability, we plan to test 
the newly released Storage-Compute Separation and Cross-Cluster Replication of 
Doris, and we embrace any ideas and inputs about the SuperSonic framework and 
the Apache Doris project.
\ No newline at end of file
+With an aim to reduce costs and increase service availability, we plan to test 
the newly released Storage-Compute Separation and Cross-Cluster Replication of 
Doris, and we embrace any ideas and inputs about the SuperSonic framework and 
the Apache Doris project.
diff --git a/blog/HCDS.md 
b/blog/Tiered-Storage-for-Hot-and-Cold-Data-What-Why-and-How.md
similarity index 90%
rename from blog/HCDS.md
rename to blog/Tiered-Storage-for-Hot-and-Cold-Data-What-Why-and-How.md
index 5975a505fac..984110f0d45 100644
--- a/blog/HCDS.md
+++ b/blog/Tiered-Storage-for-Hot-and-Cold-Data-What-Why-and-How.md
@@ -1,6 +1,6 @@
 ---
 {
-    'title': 'Hot-Cold Data Separation: What, Why, and How?',
+    'title': 'Tiered Storage for Hot and Cold Data: What, Why, and How?',
     'summary': "Hot data is the frequently accessed data, while cold data is 
the one you seldom visit but still need. Separating them is for higher 
efficiency in computation and storage.",
     'date': '2023-06-23',
     'author': 'Apache Doris',
@@ -28,7 +28,7 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-Apparently hot-cold data separation is hot now. But first of all:
+Apparently tiered storage is hot now. But first of all:
 
 ## What is Hot/Cold Data?
 
@@ -38,21 +38,21 @@ For example, orders of the past six months are "hot" and 
logs from years ago are
 
 ## Why Separate Hot and Cold Data? 
 
-Hot-Cold Data Separation is an idea often seen in real life: You put your 
favorite book on your bedside table, your Christmas ornament in the attic, and 
your childhood art project in the garage or a cheap self-storage space on the 
other side of town. The purpose is a tidy and efficient life.
+Tiered storage is an idea often seen in real life: You put your favorite book 
on your bedside table, your Christmas ornament in the attic, and your childhood 
art project in the garage or a cheap self-storage space on the other side of 
town. The purpose is a tidy and efficient life.
 
 Similarly, companies separate hot and cold data for more efficient computation 
and more cost-effective storage, because storage that allows quick read/write 
is always expensive, like SSD. On the other hand, HDD is cheaper but slower. So 
it is more sensible to put hot data on SSD and cold data on HDD. If you are 
looking for an even lower-cost option, you can go for object storage.
 
-In data analytics, hot-cold data separation is implemented by a tiered storage 
mechanism in the database. For example, Apache Doris supports three-tiered 
storage: SSD, HDD, and object storage. For newly ingested data, after a 
specified cooldown period, it will turn from hot data into cold data and be 
moved to object storage. In addition, object storage only preserves one copy of 
data, which further cuts down storage costs and the relevant 
computation/network overheads.
+In data analytics, tiered storage is implemented by a tiered storage mechanism 
in the database. For example, Apache Doris supports three-tiered storage: SSD, 
HDD, and object storage. For newly ingested data, after a specified cooldown 
period, it will turn from hot data into cold data and be moved to object 
storage. In addition, object storage only preserves one copy of data, which 
further cuts down storage costs and the relevant computation/network overheads.
 
-![](../static/images/HCDS_1.png)
+![tiered-storage](../static/images/HCDS_1.png)
 
-How much can you save by hot-cold data separation? Here is some math.
+How much can you save by tiered storage? Here is some math.
 
 In public cloud services, cloud disks generally cost 5~10 times as much as 
object storage. If 80% of your data asset is cold data and you put it in object 
storage instead of cloud disks, you can expect a cost reduction of around 70%.
 
-Let the percentage of cold data be "rate", the price of object storage be 
"OS", and the price of cloud disk be "CloudDisk", this is how much you can save 
by hot-cold data separation instead of putting all your data on cloud disks: 
+Let the percentage of cold data be "rate", the price of object storage be 
"OS", and the price of cloud disk be "CloudDisk", this is how much you can save 
by tiered storage instead of putting all your data on cloud disks: 
 
-![](../static/images/HCDS_2.png)
+![cost-calculation-of-tiered-storage](../static/images/HCDS_2.png)
 
 Now let's put real-world numbers in this formula: 
 
@@ -62,9 +62,9 @@ AWS pricing, US East (Ohio):
 - **Throughput Optimized HDD (st 1)**: 102 USD per TB per month
 - **General Purpose SSD (gp2)**: 158 USD per TB per month
 
-![](../static/images/HCDS_3.png)
+![cost-reduction-by-tiered-storage](../static/images/HCDS_3.png)
 
-## How Is Hot-Cold Separation Implemented?
+## How Is Tiered Storage Implemented?
 
 Till now, hot-cold separation sounds nice, but the biggest concern is: how can 
we implement it without compromising query performance? This can be broken down 
to three questions:
 
@@ -78,9 +78,9 @@ In what follows, I will show you how Apache Doris addresses 
them one by one.
 
 Accessing cold data from object storage will indeed be slow. One solution is 
to cache cold data in local disks for use in queries. In Apache Doris 2.0, when 
a query requests cold data, only the first-time access will entail a full 
network I/O operation from object storage. Subsequent queries will be able to 
read data directly from local cache.
 
-The granularity of caching matters, too. A coarse granularity might lead to a 
waste of cache space, but a fine granularity could be the reason for low I/O 
efficiency. Apache Doris bases its caching on data blocks. It downloads cold 
data blocks from object storage onto local Block Cache. This is the 
"pre-heating" process. With cold data fully pre-heated, queries on tables with 
hot-cold data separation will be basically as fast as those on tablets without. 
We drew this conclusion from test [...]
+The granularity of caching matters, too. A coarse granularity might lead to a 
waste of cache space, but a fine granularity could be the reason for low I/O 
efficiency. Apache Doris bases its caching on data blocks. It downloads cold 
data blocks from object storage onto local Block Cache. This is the 
"pre-heating" process. With cold data fully pre-heated, queries on tables with 
tiered storage will be basically as fast as those on tablets without. We drew 
this conclusion from test results o [...]
 
-![](../static/images/HCDS_4.png)
+![query-performance-with-tiered-storage](../static/images/HCDS_4.png)
 
 - ***Test Data****: SSB SF100 dataset*
 - ***Configuration****: 3 × 16C 64G, a cluster of 1 frontend and 3 backends* 
@@ -93,7 +93,7 @@ In object storage, only one copy of cold data is preserved. 
Within Apache Doris,
 
 Implementation-wise, the Doris frontend picks a local replica as the Leader. 
Updates to the Leader will be synchronized to all other local replicas via a 
regular report mechanism. Also, as the Leader uploads data to object storage, 
the relevant metadata will be updated to other local replicas, too.
 
-![](../static/images/HCDS_5.png)
+![data-availability-with-tiered-storage](../static/images/HCDS_5.png)
 
 ### Reduced I/O and CPU Overhead
 
@@ -103,7 +103,7 @@ A thread in Doris backend will regularly pick N tablets 
from the cold data and s
 
 ## Tutorial
 
-Separating hot and cold data in storage is a huge cost saver and there have 
been ways to ensure the same fast query performance. Executing hot-cold data 
separation is a simple 6-step process, so you can find out how it works 
yourself:
+Separating tiered storage in storage is a huge cost saver and there have been 
ways to ensure the same fast query performance. Executing hot-cold data 
separation is a simple 6-step process, so you can find out how it works 
yourself:
 
 
 
diff --git a/blog/Compaction.md 
b/blog/Understanding-Data-Compaction-in-3-Minutes.md
similarity index 97%
rename from blog/Compaction.md
rename to blog/Understanding-Data-Compaction-in-3-Minutes.md
index ac6055970b6..0bfaccedc30 100644
--- a/blog/Compaction.md
+++ b/blog/Understanding-Data-Compaction-in-3-Minutes.md
@@ -36,7 +36,7 @@ In particular, the data (which is the inflowing cargo in this 
metaphor) comes in
 - If an item needs to be discarded or replaced, since no line-jump is allowed 
on the conveyor belt (append-only), you can only put a "note" (together with 
the substitution item) at the end of the queue on the belt to remind the 
"storekeepers", who will later perform replacing or discarding for you.
 - If needed, the "storekeepers" are even kind enough to pre-process the cargo 
for you (pre-aggregating data to reduce computation burden during data 
reading). 
 
-![](../static/images/Compaction_1.png)
+![MemTable-rowset](../static/images/Compaction_1.png)
 
 As helpful as the "storekeepers" are, they can be troublemakers at times — 
that's why "team management" matters. For the compaction mechanism to work 
efficiently, you need wise planning and scheduling, or else you might need to 
deal with high memory and CPU usage, if not OOM in the backend or write error.
 
@@ -66,7 +66,7 @@ The combination of these three strategies is an example of 
cost-effective planni
 
 As columnar storage is the future for analytic databases, the execution of 
compaction should adapt to that. We call it vertical compaction. I illustrate 
this mechanism with the figure below:
 
-![](../static/images/Compaction_2.png)
+![vertical-compaction](../static/images/Compaction_2.png)
 
 Hope all these tiny blocks and numbers don't make you dizzy. Actually, 
vertical compaction can be broken down into four simple steps:
 
@@ -85,7 +85,7 @@ Segment compaction is the way to avoid that. It allows you to 
compact data at th
 
 This is a flow chart that explains how segment compaction works:
 
-![](../static/images/Compaction_3.png)
+![segment-compaction](../static/images/Compaction_3.png)
 
 Segment compaction will be triggered once the number of newly generated files 
exceeds a certain limit (let's say, 10). It is executed asynchronously by a 
specialized merging thread. Every 10 files will be merged into one, and the 
original 10 files will be deleted. Segment compaction does not prolong the data 
ingestion process by much, but it can largely accelerate data queries.
 
@@ -95,7 +95,7 @@ Time series data analysis is an increasingly common analytic 
scenario.
 
 Time series data is "born orderly". It is already arranged chronologically, it 
is written at a regular pace, and every batch of it is of similar size. It is 
like the least-worried-about child in the family. Correspondingly, we have a 
tailored compaction method for it: ordered data compaction.
 
-![](../static/images/Compaction_4.png)
+![ordered-data-compaction](../static/images/Compaction_4.png)
 
 Ordered data compaction is even simpler:
 
@@ -127,4 +127,4 @@ Every data engineer has somehow been harassed by 
complicated parameters and conf
 
 ## Conclusion
 
-This is how we keep our "storekeepers" working efficiently and 
cost-effectively. If you wonder how these strategies and optimization work in 
real practice, we tested Apache Doris with ClickBench. It reaches a 
**compaction speed of 300,000 row/s**; in high-concurrency scenarios, it 
maintains **a stable compaction score of around 50**. Also, we are planning to 
implement auto-tuning and increase observability for the compaction mechanism. 
If you are interested in the [Apache Doris](https:// [...]
\ No newline at end of file
+This is how we keep our "storekeepers" working efficiently and 
cost-effectively. If you wonder how these strategies and optimization work in 
real practice, we tested Apache Doris with ClickBench. It reaches a 
**compaction speed of 300,000 row/s**; in high-concurrency scenarios, it 
maintains **a stable compaction score of around 50**. Also, we are planning to 
implement auto-tuning and increase observability for the compaction mechanism. 
If you are interested in the [Apache Doris](https:// [...]
diff --git a/blog/scenario.md b/blog/Use-Apache-Doris-with-AI-chatbots.md
similarity index 99%
rename from blog/scenario.md
rename to blog/Use-Apache-Doris-with-AI-chatbots.md
index 0f537f84bff..af53293b174 100644
--- a/blog/scenario.md
+++ b/blog/Use-Apache-Doris-with-AI-chatbots.md
@@ -1,6 +1,6 @@
 ---
 {
-    'title': 'How Does Apache Doris Help AISPEACH Build a Datawherehouse in AI 
Chatbots Scenario',
+    'title': 'How Does Apache Doris Help AISPEECH Build a Data Warehouse in AI 
Chatbots Scenario',
     'summary': "Guide: In 2019, AISPEACH built a real-time and offline 
datawarehouse based on Apache Doris. Reling on its flexible query model, 
extremely low maintenance costs, high development efficiency, and excellent 
query performance, Apache Doris has been used in many business scenarios such 
as real-time business operations, AI chatbots analysis. It meets various data 
analysis needs such as device portrait/user label, real-time operation, data 
dashboard, self-service BI and financia [...]
     'date': '2022-11-24',
     'author': 'Zhao Wei',
@@ -27,7 +27,7 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# How Does Apache Doris Help AISPEACH Build a Datawherehouse in AI Chatbots 
Scenario
+# How Does Apache Doris Help AISPEECH Build a Data warehouse in AI Chatbots 
Scenario
 
 ![kv](/images/kv.png)
 
diff --git a/blog/Douyu.md 
b/blog/Zipping-up-the-Lambda-Architecture-for-40%-Faster-Performance.md
similarity index 94%
rename from blog/Douyu.md
rename to blog/Zipping-up-the-Lambda-Architecture-for-40%-Faster-Performance.md
index 52ca06552f4..d38a2f7902b 100644
--- a/blog/Douyu.md
+++ b/blog/Zipping-up-the-Lambda-Architecture-for-40%-Faster-Performance.md
@@ -32,7 +32,7 @@ Author: Tongyang Han, Senior Data Engineer at Douyu
 
 The Lambda architecture has been common practice in big data processing. The 
concept is to separate stream (real time data) and batch (offline data) 
processing, and that's exactly what we did. These two types of data of ours 
were processed in two isolated tubes before they were pooled together and ready 
for searches and queries.
 
-![](../static/images/Douyu_1.png)
+![Lambda-architecture](../static/images/Douyu_1.png)
 
 Then we run into a few problems:
 
@@ -51,7 +51,7 @@ I am going to elaborate on how this is done using our data 
tagging process as an
 
 Previously, our offline tags were produced by the data warehouse, put into a 
flat table, and then written in **HBase**, while real-time tags were produced 
by **Flink**, and put into **HBase** directly. Then **Spark** would work as the 
computing engine.
 
-![](../static/images/Douyu_2.png)
+![HBase-Redis-Spark](../static/images/Douyu_2.png)
 
 The problem with this stemmed from the low computation efficiency of **Flink** 
and **Spark**. 
 
@@ -60,25 +60,25 @@ The problem with this stemmed from the low computation 
efficiency of **Flink** a
 
 As a solution, we replaced **HBase** and **Spark** with **Apache Doris**, a 
real-time analytic database, and moved part of the computational logic of the 
foregoing wide-time-range real-time tags from **Flink** to **Apache Doris**.
 
-![](../static/images/Douyu_3.png)
+![Apache-Doris-Redis](../static/images/Douyu_3.png)
 
 Instead of putting our flat tables in HBase, we place them in Apache Doris. 
These tables are divided into partitions based on time sensitivity. Offline 
tags will be updated daily while real-time tags will be updated in real time. 
We organize these tables in the Aggregate Model of Apache Doris, which allows 
partial update of data.
 
 Instead of using Spark for queries, we parse the query rules into SQL for 
execution in Apache Doris. For pattern matching, we use Redis to cache the hot 
data from Apache Doris, so the system can respond to such queries much faster.
 
-![](../static/images/Douyu_4.png)
+![Real-time-and-offline-data-processing-in-Apache-Doris](../static/images/Douyu_4.png)
 
 ## **Computational Pipeline of Wide-Time-Range Real-Time Tags**
 
 In some cases, the computation of wide-time-range real-time tags entails the 
aggregation of historical (offline) data with real-time data. The following 
figure shows our old computational pipeline for these tags. 
 
-![](../static/images/Douyu_5.png)
+![offline-data-processing-link](../static/images/Douyu_5.png)
 
 As you can see, it required multiple tasks to finish computing one real-time 
tag. Also, in complicated aggregations that involve a collection of aggregation 
operations, any improper resource allocation could lead to back pressure or 
waste of resources. This adds to the difficulty of task scheduling. The 
maintenance and stability guarantee of such a long pipeline could be an issue, 
too.
 
 To improve on that, we decided to move such aggregation workload to Apache 
Doris.
 
-![](../static/images/Douyu_6.png)
+![real-time-data-processing-link](../static/images/Douyu_6.png)
 
 We have around 400 million customer tags in our system, and each customer is 
attached with over 300 tags. We divide customers into more than 10,000 groups, 
and we have to update 5000 of them on a daily basis. The above improvement has 
sped up the computation of our wide-time-range real-time queries by **40%**.
 
diff --git a/blog/release-note-2.0.0.md b/blog/release-note-2.0.0.md
index 93a60c8ee92..60fed524577 100644
--- a/blog/release-note-2.0.0.md
+++ b/blog/release-note-2.0.0.md
@@ -47,7 +47,7 @@ This new version highlights:
 
 In SSB-Flat and TPC-H benchmarking, Apache Doris 2.0.0 delivered **over 
10-time faster query performance** compared to an early version of Apache Doris.
 
-![](/images/release-note-2.0.0-1.png)
+![TPCH-benchmark-SSB-Flat-benchmark](/images/release-note-2.0.0-1.png)
 
 This is realized by the introduction of a smarter query optimizer, inverted 
index, a parallel execution model, and a series of new functionalities to 
support high-concurrency point queries.
 
@@ -57,7 +57,7 @@ The brand new query optimizer, Nereids, has a richer 
statistical base and adopts
 
 TPC-H tests showed that Nereids, with no human intervention, outperformed the 
old query optimizer by a wide margin. Over 100 users have tried Apache Doris 
2.0.0 in their production environment and the vast majority of them reported 
huge speedups in query execution.
 
-![](/images/release-note-2.0.0-2.png)
+![Nereids-optimizer-TPCH](/images/release-note-2.0.0-2.png)
 
 **Doc**: https://doris.apache.org/docs/dev/query-acceleration/nereids/
 
@@ -65,11 +65,11 @@ Nereids is enabled by default in Apache Doris 2.0.0: `SET 
enable_nereids_planner
 
 ### Inverted Index
 
-In Apache Doris 2.0.0, we introduced inverted index to better support fuzzy 
keyword search, equivalence queries, and range queries.
+In Apache Doris 2.0.0, we introduced [inverted 
index](https://doris.apache.org/docs/dev/data-table/index/inverted-index?_highlight=inverted)
 to better support fuzzy keyword search, equivalence queries, and range queries.
 
 A smartphone manufacturer tested Apache Doris 2.0.0 in their user behavior 
analysis scenarios. With inverted index enabled, v2.0.0 was able to finish the 
queries within milliseconds and maintain stable performance as the query 
concurrency level went up. In this case, it is 5 to 90 times faster than its 
old version. 
 
-![](/images/release-note-2.0.0-3.png)
+![inverted-index-performance](/images/release-note-2.0.0-3.png)
 
 ### 20 times higher concurrency capability
 
@@ -79,7 +79,7 @@ For a column-oriented DBMS like Apache Doris, the I/O usage 
of point queries wil
 
 After these optimizations, Apache Doris 2.0 reached a concurrency level of 
**30,000 QPS per node** on YCSB on a 16 Core 64G cloud server with 4×1T hard 
drives, representing an improvement of **20 times** compared to its older 
version. This makes Apache Doris a good alternative to HBase in 
high-concurrency scenarios, so that users don't need to endure extra 
maintenance costs and redundant storage brought by complicated tech stacks.
 
-Read more: https://doris.apache.org/blog/High_concurrency
+Read more: 
https://doris.apache.org/blog/How-We-Increased-Database-Query-Concurrency-by-20-Times
 
 ### A self-adaptive parallel execution model
 
@@ -101,19 +101,19 @@ Apache Doris has been pushing its boundaries. Starting as 
an OLAP engine for rep
 
 Apache Doris 2.0.0 provides native support for semi-structured data. In 
addition to JSON and Array, it now supports a complex data type: Map. Based on 
Light Schema Change, it also supports Schema Evolution, which means you can 
adjust the schema as your business changes. You can add or delete fields and 
indexes, and change the data types for fields. As we introduced inverted index 
and a high-performance text analysis algorithm into it, it can execute 
full-text search and dimensional analy [...]
 
-![](/images/release-note-2.0.0-4.png)
+![Apache-Doris-VS-Elasticsearch](/images/release-note-2.0.0-4.png)
 
 ### Enhanced data lakehousing capabilities
 
 In Apache Doris 1.2, we introduced Multi-Catalog to allow for auto-mapping and 
auto-synchronization of data from heterogeneous sources. In version 2.0.0, we 
extended the list of data sources supported and optimized Doris for based on 
users' needs in production environment.
 
-![](/images/release-note-2.0.0-5.png)
+![Apache-Doris-data-lakehouse](/images/release-note-2.0.0-5.png)
 
 Apache Doris 2.0.0 supports dozens of data sources including Hive, Hudi, 
Iceberg, Paimon, MaxCompute, Elasticsearch, Trino, ClickHouse, and almost all 
open lakehouse formats. It also supports snapshot queries on Hudi Copy-on-Write 
tables and read optimized queries on Hudi Merge-on-Read tables. It allows for 
authorization of Hive Catalog using Apache Ranger, so users can reuse their 
existing privilege control system. Besides, it supports extensible 
authorization plug-ins to enable user-de [...]
 
 TPC-H benchmark tests showed that Apache Doris 2.0.0 is 3~5 times faster than 
Presto/Trino in queries on Hive tables. This is realized by all-around 
optimizations (in small file reading, flat table reading, local file cache, 
ORC/Parquet file reading, Compute Nodes, and information collection of external 
tables) finished in this development cycle and the distributed execution 
framework, vectorized execution engine, and query optimizer of Apache Doris. 
 
-![](/images/release-note-2.0.0-6.png)
+![Apache-Doris-VS-Trino](/images/release-note-2.0.0-6.png)
 
 All this gives Apache Doris 2.0.0 an edge in data lakehousing scenarios. With 
Doris, you can do incremental or overall synchronization of multiple upstream 
data sources in one place, and expect much higher data query performance than 
other query engines. The processed data can be written back to the sources or 
provided for downstream systems. In this way, you can make Apache Doris your 
unified data analytic gateway.
 
@@ -144,23 +144,23 @@ As part of our continuing effort to strengthen the 
real-time analytic capability
 
 The sources of system instability often includes small file merging, write 
amplification, and the consequential disk I/O and CPU overheads. Hence, we 
introduced Vertical Compaction and Segment Compaction in version 2.0.0 to 
eliminate OOM errors in compaction and avoid the generation of too many segment 
files during data writing. After such improvements, Apache Doris can write data 
50% faster while **using only 10% of the memory that it previously used**.
 
-Read more: https://doris.apache.org/blog/Compaction
+Read more: 
https://doris.apache.org/blog/Understanding-Data-Compaction-in-3-Minutes/
 
 ### Auto-synchronization of table schema
 
 The latest Flink-Doris-Connector allows users to synchronize an entire 
database (such as MySQL and Oracle) to Apache Doris by one simple step. 
According to our test results, one single synchronization task can support the 
real-time concurrent writing of thousands of tables. Users no longer need to go 
through a complicated synchronization procedure because Apache Doris has 
automated the process. Changes in the upstream data schema will be 
automatically captured and dynamically updated to  [...]
 
-Read more: https://doris.apache.org/blog/FDC
+Read more: 
https://doris.apache.org/blog/Auto-Synchronization-of-an-Entire-MySQL-Database-for-Data-Analysis
 
 ## A New Multi-Tenant Resource Isolation Solution
 
 The purpose of multi-tenant resource isolation is to avoid resource preemption 
in the case of heavy loads. For that sake, older versions of Apache Doris 
adopted a hard isolation plan featured by Resource Group: Backend nodes of the 
same Doris cluster would be tagged, and those of the same tag formed a Resource 
Group. As data was ingested into the database, different data replicas would be 
written into different Resource Groups, which will be responsible for different 
workloads. For examp [...]
 
-![](/images/release-note-2.0.0-7.png)
+![resource-isolation](/images/release-note-2.0.0-7.png)
 
 This is an effective solution, but in practice, it happens that some Resource 
Groups are heavily occupied while others are idle. We want a more flexible way 
to reduce vacancy rate of resources. Thus, in 2.0.0, we introduce Workload 
Group resource soft limit.
 
-![](/images/release-note-2.0.0-8.png)
+![workload-group](/images/release-note-2.0.0-8.png)
 
 The idea is to divide workloads into groups to allow for flexible management 
of CPU and memory resources. Apache Doris associates a query with a Workload 
Group, and limits the percentage of CPU and memory that a single query can use 
on a backend node. The memory soft limit can be configured and enabled by the 
user. 
 
@@ -190,7 +190,7 @@ Apache Doris 2.0 provides two solutions to address the 
needs of the first two ty
 1. **Compute nodes**. We introduced stateless compute nodes in version 2.0. 
Unlike the mix nodes, the compute nodes do not save any data and are not 
involved in workload balancing of data tablets during cluster scaling. Thus, 
they are able to quickly join the cluster and share the computing pressure 
during peak times. In addition, in data lakehouse analysis, these nodes will be 
the first ones to execute queries on remote storage (HDFS/S3) so there will be 
no resource competition between  [...]
    1.  Doc: https://doris.apache.org/docs/dev/advanced/compute_node/
 2. **Hot-cold data separation**. Hot/cold data refers to data that is 
frequently/seldom accessed, respectively. Generally, it makes more sense to 
store cold data in low-cost storage. Older versions of Apache Doris support 
lifecycle management of table partitions: As hot data cooled down, it would be 
moved from SSD to HDD. However, data was stored with multiple replicas on HDD, 
which was still a waste. Now, in Apache Doris 2.0, cold data can be stored in 
object storage, which is even chea [...]
-   1.  Read more: https://doris.apache.org/blog/HCDS/
+   1.  Read more: 
https://doris.apache.org/blog/Tiered-Storage-for-Hot-and-Cold-Data:-What,-Why,-and-How?/
 
 For neater separate of computation and storage, the VeloDB team is going to 
contribute the Cloud Compute-Storage-Separation solution to the Apache Doris 
project. The performance and stability of it has stood the test of hundreds of 
companies in their production environment. The merging of code will be finished 
by October this year, and all Apache Doris users will be able to get an early 
taste of it in September.
 
diff --git a/static/images/Unicom-1.png b/static/images/Unicom-1.png
new file mode 100644
index 00000000000..2773ab0a1e8
Binary files /dev/null and b/static/images/Unicom-1.png differ
diff --git a/static/images/Unicom-2.png b/static/images/Unicom-2.png
new file mode 100644
index 00000000000..99b21ee4ccb
Binary files /dev/null and b/static/images/Unicom-2.png differ


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[doris-website] branch master updated: update url and pics (#310)

Reply via email to