This is an automated email from the ASF dual-hosted git repository.
morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new a9220c2056b [blog](update) Update MiniMax blog (#1067)
a9220c2056b is described below
commit a9220c2056babd48360d5e8d76d94f2ee0ddd408
Author: KassieZ <[email protected]>
AuthorDate: Fri Aug 30 14:09:33 2024 +0800
[blog](update) Update MiniMax blog (#1067)
# Versions
- [ ] dev
- [ ] 3.0
- [ ] 2.1
- [ ] 2.0
- [x] blog
# Languages
- [ ] Chinese
- [x] English
---
...d-built-a-pb-scale-logging-system-with-doris.md | 145 +++++++++++++++++++++
blog/apache-doris-vs-rockset.md | 2 -
blog/auto-partition-in-apache-doris.md | 2 +-
blog/migrate-lakehouse-from-bigquery-to-doris.md | 2 +-
blog/release-note-3.0.1.md | 2 +-
.../gettingStarted/demo-block/latest.tsx | 10 +-
gettingStarted/demo-block/latest.tsx | 7 +-
src/components/recent-blogs/recent-blogs.data.ts | 9 +-
src/constant/newsletter.data.ts | 14 +-
.../images/apache-doris-based-logging-system.png | Bin 0 -> 126955 bytes
.../images/minimax-migrated-from-loki-to-doris.png | Bin 0 -> 704790 bytes
.../the-old-grafana-Loki-based-logging-system.png | Bin 0 -> 116103 bytes
static/images/why-Apache-Doris.png | Bin 0 -> 270135 bytes
13 files changed, 166 insertions(+), 27 deletions(-)
diff --git
a/blog/ai-unicorn-minimax-from-loki-and-built-a-pb-scale-logging-system-with-doris.md
b/blog/ai-unicorn-minimax-from-loki-and-built-a-pb-scale-logging-system-with-doris.md
new file mode 100644
index 00000000000..0848e2f8aca
--- /dev/null
+++
b/blog/ai-unicorn-minimax-from-loki-and-built-a-pb-scale-logging-system-with-doris.md
@@ -0,0 +1,145 @@
+---
+{
+ 'title': 'How AI unicorn MiniMax migrated from Loki and built a PB-scale
logging system with Apache Doris',
+ 'summary': "Serving a PB-scale data size with over 99.9% availability,
Apache Doris is the vital signs monitor of MiniMax, a generative AI startup
backed by Alibaba.",
+ 'description': "Serving a PB-scale data size with over 99.9% availability,
Apache Doris is the vital signs monitor of MiniMax, a generative AI startup
backed by Alibaba.",
+ 'date': '2024-08-29',
+ 'author': 'Apache Doris',
+ 'tags': ['Best Practice'],
+ 'picked': "true",
+ 'order': "1",
+ "image": '/images/minimax-migrated-from-loki-to-doris.png'
+}
+
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+MiniMax, [a generative AI startup backed by Alibaba, Tencent, miHoYo,
etc.](https://fortune.com/asia/2024/03/05/alibaba-leads-financing-round-chinese-ai-startup-minimax/),
has been investing most of its efforts in MoE (Mixture of Experts) before it
became an industry consensus. In April 2024, MiniMax launched its first
commercially deployed MoE-based LLM, **MiniMax-abab 6.5**, which contains over
a trillion parameters and delivers performances comparable to GPT-4, Claude-3,
and Gemini-1.5.
+
+As their LLM is getting more complex and called upon more frequently, it
generates an exploding amount of logs from model training and inference. These
logs provide the basis for performance monitoring, optimization, and
troubleshooting. The existing Grafana Loki-based logging system of MiniMax
faced performance and stability issues, so they planned for an upgrade. After
looking at the common industry solutions, they came to Apache Doris.
+
+**Now, all of MiniMax's business lines have been integrated with the Apache
Doris-based logging system, which serves a PB-scale data size with over 99.9%
availability. The query latency on 100 million logs is within seconds.**
+
+## The old Grafana Loki-based logging system
+
+The design of Loki, an open-source log aggregation system, was inspired by
Prometheus and developed by the Grafana Labs team. It does not have an indexing
structure, but instead builds indexes only on log labels and metadata.
+
+The major components of a Loki-based system typically include:
+
+- **Loki**: the main server responsible for log storage and querying.
+
+- **Promtail**: the agent layer for collecting logs and sending them to Loki.
+
+- **Grafana**: for user interface visualization.
+
+To deploy Grafana Loki, each cluster should be deployed with a complete set of
log collectors and Loki log storage/query services.
+
+Loki uses an Index + Chunk design for log storage, where during ingestion, the
different log streams are dispersed across various Ingesters based on a hash of
the log labels, and the Ingesters are responsible for writing the log data to
object storage. During querying, the Querier retrieves the relevant Chunks from
the object storage based on the Index, and then performs the log matching.
+
+
+
+Although Grafana Loki is positioned as a lightweight, horizontally scalable,
and highly available log management system, it still faces some challenges in
practical business use:
+
+- **Excessive query resource consumption**: Loki does not create indexes based
on the log, but instead, it only performs preliminary filtering of logs at the
label granularity. Thus, for searches on the logs, it applies the query
mechanism to perform full-text regular expression matching on the entire log
data set. This operation can lead to spikes in resource consumption, including
CPU, memory, and network bandwidth. As the volume of data being queried and the
query per second (QPS) inc [...]
+
+- **Complex architecture**: In addition to the modules shown in the above
diagram, Loki also includes components like the Index Gateway, Memcache, and
Compactor. The large number of architectural components makes the system
challenging to operate and manage, and complex to configure.
+
+- **High maintenance cost and difficulty**: MiniMax has a large number of
deployed clusters, and each cluster has differences in its system, resources,
storage, and network environments. The need to deploy an independent Loki
architecture in each cluster adds to the maintenance difficulty.
+
+## Why Apache Doris
+
+As one of the most data-intensive industries, AI use cases are characterized
by long processing pipelines, abundant contextual data, and large per-request
data volumes. Thus, the log size the MiniMax generates far exceeds those of
non-AI software products of the same user base. The gigantic log size of
MiniMax requires their logging system to be:
+
+- **High-performance**: They need the system to return query results on 100
million log entries within seconds.
+
+- **Flexible**: The system should support log alerting and log metric queries,
such as generating statistical trend lines for key terms.
+
+- **Low-cost**: The petabyte-scale raw log data continues to grow, so it's a
make-or-break factor to keep the storage and computational costs within
reasonable bounds.
+
+After an evaluation of mature logging system architectures in the industry,
MiniMax identified the following key components typically found in leading log
management solutions:
+
+- **Collection agent**: collecting logs from service standard outputs and
pushing the data into a central message queue.
+
+- **Message queue**: decoupling upstream and downstream components, absorbing
spikes, and ensuring system stability even when downstream components are
unavailable.
+
+- **Storage and query middleware**: storing and querying the log data. In a
logging system, this middleware should be capable of inverted indexing to
support efficient log searches.
+
+MiniMax decided to use iLogtail for the collection agents, Kafka for the
message queue, and Apache Doris as the storage and query middleware. In
selecting the storage middleware, MiniMax compared the representative
technologies of Apache Doris and Elasticsearch.
+
+Based on such reference architecture, MiniMax decided to use iLogtail as the
collection agent, Apache Kafka for the message queue, and **[Apache
Doris](https://doris.apache.org) as the storage and query middleware**. The
middleware decision was made after comparing Apache Doris and Elasticsearch.
+
+
+
+Apache Doris shows competitiveness in cost and performance. It stands out
particularly in storage efficiency, write throughput, and aggregation.
Additionally, its compatibility with the MySQL syntax makes it more
user-friendly.
+
+## Apache Doris-based logging system
+
+
+
+The new logging system of MiniMax, called Mlogs, is more streamlined, with a
single architecture serving all clusters. The upper layer acts as the control
plane for the logging system, which consists of the encapsulation of log query
interfaces and the module for automatic configuration generation and
distribution. The lower layer represents the data plane of the logging system,
containing the log collection agent, message queue, log writer, and the
**Apache Doris** database.
+
+Logs generated by the cluster services are collected by iLogtail and pushed to
Kafka. Part of these logs is pulled from Kafka by the Mlogs Ingester and
written to the Doris cluster via the Stream Load method of Apache Doris. The
rest is directly subscribed to in real-time by Doris via Routine Load, pulling
the message stream from Kafka. **Ultimately, Apache Doris handles the storage
and querying of all log data, eliminating the need for separate deployments for
each cluster.**
+
+## Hands-on experience from MiniMax
+
+**Log ingestion**
+
+The new architecture utilizes both the Routine Load and Stream Load methods of
Apache Doris. Routine Load is ready to use out of the box and can directly
handle JSON logs without the need for additional parsing. For more complex logs
that require filtering and processing, MiniMax has introduced a log writer
called Mlogs Ingester between Kafka and Doris. The Mlogs Ingester parses and
processes the logs before writing them to Doris via Stream Load.
+
+**Log search**
+
+For log searches, MiniMax utilizes the inverted indexes and full-text regular
expression query capabilities of Apache Doris.
+
+- The inverted index of Apache Doris fits into a wide range of use cases and
delivers high query performance. It's mainly used in `MATCH` and `MATCH_PHRASE`
queries.
+
+- Full-text regular expression query (`REGEXP`) provides higher precision but
lower performance than token-based queries. It is suitable for smaller-scale
queries where precision is critical.
+
+**Performance improvement**
+
+MiniMax implements **query truncation** to further accelerate queries. Log
data is arranged linearly in chronological order. If a query requests data of a
large range, it can consume excessive computation, storage, and network
resources and potentially lead to query timeouts or even system unavailability.
So they set and truncate the time range of the queries to prevent overly broad
queries, and pre-calculate the data volume for all tables every 15 minutes to
dynamically estimate the max [...]
+
+**Cost control**
+
+To cut down storage costs, MiniMax utilizes the **[tiered
storage](https://doris.apache.org/docs/table-design/cold-hot-separation/)**
capabilities of Apache Doris. They define data within the last 7 days as hot
data and data older than 7 days as cold data. Data will be moved to object
storage as soon as it turns cold. Furthermore, they archive object storage data
that is over 30 days old and only restore the archived data when necessary.
+
+## Value to MiniMax
+
+Now, the Apache Doris-based logging system has been supporting all business
line log data within MiniMax, serving a **PB-scale data size** with over
**99.9% availability**. It has also brought the following values to MiniMax:
+
+- **Simplified architecture**: The new system is easier to deploy and allows a
single framework to serve all clusters. This reduces maintenance and management
complexity, thus saving operational manpower and costs.
+
+- **Fast query response**: The new system can respond to keyword searches and
aggregation queries from 1 billion log records within 2 seconds. Most log
queries can return results within seconds, too.
+
+- **High write performance**: With the current hardware setups, the system can
deliver a log write throughput of 10 GB/s, while maintaining data latency
within seconds.
+
+- **Low storage costs**: The data compression ratio reaches 5:1 and tiered
storage further reduces storage costs by 70%.
+
+## What's next
+
+After a successful initial experience with Apache Doris, MiniMax proceeds with
the next phase of its upgrade plan, which includes the following efforts:
+
+- **Log pre-processing**: introduce log sampling and structuring to improve
data usability and storage efficiency.
+
+- **Tracing**: integrate the logging system with other observability systems
(monitoring, alerting, tracing, etc.) to provide comprehensive operational
insights.
+
+- **Lakehousing**: expand the use of Apache Doris include big data processing
and analysis within MiniMax, laying the foundation for a data lakehouse.
+
+If you have any questions or require assistance regarding Apache Doris, join
the
[community](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-2gmq5o30h-455W226d79zP3L96ZhXIoQ).
\ No newline at end of file
diff --git a/blog/apache-doris-vs-rockset.md b/blog/apache-doris-vs-rockset.md
index a7a443be944..a176b14afe8 100644
--- a/blog/apache-doris-vs-rockset.md
+++ b/blog/apache-doris-vs-rockset.md
@@ -6,8 +6,6 @@
'date': '2024-06-24',
'author': 'Zaki Lu',
'tags': ['Top News'],
- 'picked': "true",
- 'order': "4",
"image": '/images/doris-vs-rockset.jpeg'
}
diff --git a/blog/auto-partition-in-apache-doris.md
b/blog/auto-partition-in-apache-doris.md
index a3c74479dca..8ebd505ec35 100644
--- a/blog/auto-partition-in-apache-doris.md
+++ b/blog/auto-partition-in-apache-doris.md
@@ -7,7 +7,7 @@
'author': 'Apache Doris',
'tags': ['Tech Sharing'],
'picked': "true",
- 'order': "2",
+ 'order': "3",
"image": '/images/auto-partition-in-apache-doris.jpg'
}
diff --git a/blog/migrate-lakehouse-from-bigquery-to-doris.md
b/blog/migrate-lakehouse-from-bigquery-to-doris.md
index 536ae4727a6..ee9abcd216b 100644
--- a/blog/migrate-lakehouse-from-bigquery-to-doris.md
+++ b/blog/migrate-lakehouse-from-bigquery-to-doris.md
@@ -7,7 +7,7 @@
'author': 'Dien, Tran Thanh',
'tags': ['Best Practice'],
'picked': "true",
- 'order': "3",
+ 'order': "4",
"image": '/images/migrate-lakehouse-from-bigquery-to-apache-doris.jpg'
}
diff --git a/blog/release-note-3.0.1.md b/blog/release-note-3.0.1.md
index 669c14c6d29..05aec267149 100644
--- a/blog/release-note-3.0.1.md
+++ b/blog/release-note-3.0.1.md
@@ -7,7 +7,7 @@
'author': 'Apache Doris',
'tags': ['Release Notes'],
'picked': "true",
- 'order': "1",
+ 'order': "2",
"image": '/images/3.0.1.jpg'
}
---
diff --git a/common_docs_zh/gettingStarted/demo-block/latest.tsx
b/common_docs_zh/gettingStarted/demo-block/latest.tsx
index 7ed92ea380b..c2109942320 100644
--- a/common_docs_zh/gettingStarted/demo-block/latest.tsx
+++ b/common_docs_zh/gettingStarted/demo-block/latest.tsx
@@ -24,8 +24,8 @@ export default function Latest() {
</div>
</div> */}
<div className="home-page-hero-right">
- <a className="latest-button"
href="https://ask.selectdb.com/">
- <div
className="home-page-hero-button-label"><div>近期事件</div></div>
+ <a className="latest-button" href="https://hdxu.cn/AfjED">
+ <div
className="home-page-hero-button-label"><div>近期活动</div></div>
<div className="latest-button-title">
{/* <div className="home-page-hero-button-icon">
<svg width="24px" viewBox="0 0 24 24"
xmlns="http://www.w3.org/2000/svg">
@@ -33,10 +33,10 @@ export default function Latest() {
<path fill="none" d="M0 0h24v24H0Z"></path>
</svg>
</div> */}
- <div style={{ marginBottom: 10 }}>技术论坛全面升级上线!Ask
and Discover</div>
+ <div style={{ marginBottom: 10 }}>飞轮科技 x 字节跳动开源
Meetup@北京站</div>
</div>
- <div style={{ fontSize: 12, marginBottom: 20 }}>联合众多
Doris
生态中的开发者、用户以及合作伙伴,共同发起和创建的问答社区。在这里,你可以自由的提出和讨论技术问题、分享和收获技术经验、与社区的小伙伴进行互动和交流。</div>
- <div style={{ fontSize: 14, marginBottom: 10
}}>进入论坛</div>
+ <div style={{ fontSize: 12, marginBottom: 20
}}>来自抖音集团、飞轮科技、爱玛科技、中国电信、天翼云等多位行业技术专家,将为参会者带来多行业、跨领域的技术分享及落地实践。</div>
+ <div style={{ fontSize: 14, marginBottom: 10
}}>立即报名</div>
</a>
<a className="latest-button"
href={`/zh-CN/docs${currentVersion === '' ? '' :
`/${currentVersion}`}/releasenotes/v3.0/release-3.0.1`}>
<div
className="home-page-hero-button-label"><div>版本发布</div></div>
diff --git a/gettingStarted/demo-block/latest.tsx
b/gettingStarted/demo-block/latest.tsx
index 57acf7b8ea4..bcb88fc41d1 100644
--- a/gettingStarted/demo-block/latest.tsx
+++ b/gettingStarted/demo-block/latest.tsx
@@ -48,14 +48,9 @@ export default function Latest() {
</div> */}
<div style={{ marginBottom: 10 }}>Apache Doris
3.0.1 just released</div>
</div>
- <div style={{ fontSize: 12, marginBottom: 20 }}>In
this version, Apache Doris has improvements in compute-storage decoupling,
lakehouse, semi-structured data analysis and more.</div>
+ <div style={{ fontSize: 12, marginBottom: 20 }}>Apache
Doris has improvements in compute-storage decoupling, lakehouse,
semi-structured data analysis and more.</div>
<div style={{ fontSize: 14, marginBottom: 10 }}>Learn
more</div>
</a>
-
-
-
-
-
</div>
{/* <div style={{ fontSize: '1rem', fontWeight: 500, width:
600, marginTop: '1rem', color: '#1d1d1d' }}>学习路径</div> */}
diff --git a/src/components/recent-blogs/recent-blogs.data.ts
b/src/components/recent-blogs/recent-blogs.data.ts
index 8d617b45591..29578311b49 100644
--- a/src/components/recent-blogs/recent-blogs.data.ts
+++ b/src/components/recent-blogs/recent-blogs.data.ts
@@ -1,4 +1,8 @@
export const RECENT_BLOGS_POSTS = [
+ {
+ label: `Apache Doris 3.0.1 just released`,
+ link: 'https://doris.apache.org/blog/release-note-3.0.1',
+ },
{
label: 'Automatic and flexible data sharding: Auto Partition in Apache
Doris',
link: 'https://doris.apache.org/blog/auto-partition-in-apache-doris',
@@ -11,8 +15,5 @@ export const RECENT_BLOGS_POSTS = [
label: 'Why Apache Doris is the Best Open Source Alternative to
Rockset',
link: 'https://doris.apache.org/blog/apache-doris-vs-rockset',
},
- {
- label: `Steps to industry-leading query speed: evolution of the Apache
Doris execution engine`,
- link:
'https://doris.apache.org/blog/evolution-of-the-apache-doris-execution-engine',
- }
+
];
diff --git a/src/constant/newsletter.data.ts b/src/constant/newsletter.data.ts
index 0eae1e7e59d..e0721c4c81b 100644
--- a/src/constant/newsletter.data.ts
+++ b/src/constant/newsletter.data.ts
@@ -1,4 +1,11 @@
export const NEWSLETTER_DATA = [
+ {
+ tags: ['Top News'],
+ title: "How AI unicorn MiniMax migrated from Loki and built a PB-scale
logging system with Apache Doris",
+ content: `Serving a PB-scale data size with over 99.9% availability,
Apache Doris is the vital signs monitor of MiniMax, a generative AI startup
backed by Alibaba.`,
+ to:
'/blog/ai-unicorn-minimax-from-loki-and-built-a-pb-scale-logging-system-with-doris',
+ image: 'minimax-migrated-from-loki-to-doris.png',
+ },
{
tags: ['Release Note'],
title: "Apache Doris version 3.0.1 just released",
@@ -21,13 +28,6 @@ export const NEWSLETTER_DATA = [
to: '/blog/migrate-lakehouse-from-bigquery-to-doris',
image: 'migrate-lakehouse-from-bigquery-to-apache-doris.jpg',
},
- {
- tags: ['Top News'],
- title: "Why Apache Doris is the Best Open Source Alternative to
Rockset",
- content: `Among of all the claim-to-be alternatives to Rockset, Apache
Doris is one of the few that cover all the key features of Rockset.`,
- to: '/blog/apache-doris-vs-rockset',
- image: 'doris-vs-rockset.jpeg',
- },
];
diff --git a/static/images/apache-doris-based-logging-system.png
b/static/images/apache-doris-based-logging-system.png
new file mode 100644
index 00000000000..fdfae053c94
Binary files /dev/null and
b/static/images/apache-doris-based-logging-system.png differ
diff --git a/static/images/minimax-migrated-from-loki-to-doris.png
b/static/images/minimax-migrated-from-loki-to-doris.png
new file mode 100644
index 00000000000..9d8a4b33ace
Binary files /dev/null and
b/static/images/minimax-migrated-from-loki-to-doris.png differ
diff --git a/static/images/the-old-grafana-Loki-based-logging-system.png
b/static/images/the-old-grafana-Loki-based-logging-system.png
new file mode 100644
index 00000000000..eaa13b8b13b
Binary files /dev/null and
b/static/images/the-old-grafana-Loki-based-logging-system.png differ
diff --git a/static/images/why-Apache-Doris.png
b/static/images/why-Apache-Doris.png
new file mode 100644
index 00000000000..05cc69d78fb
Binary files /dev/null and b/static/images/why-Apache-Doris.png differ
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]