This is an automated email from the ASF dual-hosted git repository.
luzhijing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 62d380c4b9f Add Changan Auto blog (#345)
62d380c4b9f is described below
commit 62d380c4b9f246978fe714d290d2ec9e725f7816
Author: Hu Yanjun <[email protected]>
AuthorDate: Wed Nov 29 11:41:52 2023 +0800
Add Changan Auto blog (#345)
---
...e-iov-data-analytics-helps-prevent-accidents.md | 112 +++++++++++++++++++++
static/images/IoV-CAN-data.jpeg | Bin 0 -> 70575 bytes
static/images/IoV-DTC-data-query.png | Bin 0 -> 97232 bytes
static/images/IoV-Doris-based-data-warehouse.png | Bin 0 -> 408544 bytes
static/images/IoV-Hive-based-data-warehouse.png | Bin 0 -> 284481 bytes
5 files changed, 112 insertions(+)
diff --git
a/blog/how-big-data-is-saving-lives-in-real-time-iov-data-analytics-helps-prevent-accidents.md
b/blog/how-big-data-is-saving-lives-in-real-time-iov-data-analytics-helps-prevent-accidents.md
new file mode 100644
index 00000000000..0f5b6692c2b
--- /dev/null
+++
b/blog/how-big-data-is-saving-lives-in-real-time-iov-data-analytics-helps-prevent-accidents.md
@@ -0,0 +1,112 @@
+---
+{
+ 'title': 'How Big Data Is Saving Lives in Real Time: IoV Data Analytics
Helps Prevent Accidents',
+ 'summary': "What needs to be taken care of in IoV data analysis? What's
the different between a near real-time analytic data platform and an actual
real-time analytic data platform?",
+ 'date': '2023-11-29',
+ 'author': 'Apache Doris',
+ 'tags': ['Best Practice'],
+}
+
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+Internet of Vehicles, or IoV, is the product of the marriage between the
automotive industry and IoT. IoV data is expected to get larger and larger,
especially with electric vehicles being the new growth engine of the auto
market. The question is: Is your data platform ready for that? This post shows
you what an OLAP solution for IoV looks like.
+
+## What is special about IoV data?
+
+The idea of IoV is intuitive: to create a network so vehicles can share
information with each other or with urban infrastructure. What‘s often
under-explained is the network within each vehicle itself. On each car, there
is something called Controller Area Network (CAN) that works as the
communication center for the electronic control systems. For a car traveling on
the road, the CAN is the guarantee of its safety and functionality, because it
is responsible for:
+
+- **Vehicle system monitoring**: The CAN is the pulse of the vehicle system.
For example, sensors send the temperature, pressure, or position they detect to
the CAN; controllers issue commands (like adjusting the valve or the drive
motor) to the executor via the CAN.
+- **Real-time feedback**: Via the CAN, sensors send the speed, steering angle,
and brake status to the controllers, which make timely adjustments to the car
to ensure safety.
+- **Data sharing and coordination**: The CAN allows for data exchange (such as
status and commands) between various devices, so the whole system can be more
performant and efficient.
+- **Network management and troubleshooting**: The CAN keeps an eye on devices
and components in the system. It recognizes, configures, and monitors the
devices for maintenance and troubleshooting.
+
+With the CAN being that busy, you can imagine the data size that is traveling
through the CAN every day. In the case of this post, we are talking about a car
manufacturer who connects 4 million cars together and has to process 100
billion pieces of CAN data every day.
+
+## IoV data processing
+
+To turn this huge data size into valuable information that guides product
development, production, and sales is the juicy part. Like most data analytic
workloads, this comes down to data writing and computation, which are also
where challenges exist:
+
+- **Data writing at scale**: Sensors are everywhere in a car: doors, seats,
brake lights... Plus, many sensors collect more than one signal. The 4 million
cars add up to a data throughput of millions of TPS, which means dozens of
terabytes every day. With increasing car sales, that number is still growing.
+- **Real-time analysis**: This is perhaps the best manifestation of "time is
life". Car manufacturers collect the real-time data from their vehicles to
identify potential malfunctions, and fix them before any damage happens.
+- **Low-cost computation and storage**: It's hard to talk about huge data size
without mentioning its costs. Low cost makes big data processing sustainable.
+
+## From Apache Hive to Apache Doris: a transition to real-time analysis
+
+Like Rome, a real-time data processing platform is not built in a day. The car
manufacturer used to rely on the combination of a batch analytic engine (Apache
Hive) and some streaming frameworks and engines (Apache Flink, Apache Kafka) to
gain near real-time data analysis performance. They didn't realize they needed
real-time that bad until real-time was a problem.
+
+**Near Real-Time Data Analysis Platform**
+
+This is what used to work for them:
+
+
+
+Data from the CAN and vehicle sensors are uploaded via 4G network to the cloud
gateway, which writes the data into Kafka. Then, Flink processes this data and
forwards it to Hive. Going through several data warehousing layers in Hive, the
aggregated data is exported to MySQL. At the end, Hive and MySQL provide data
to the application layer for data analysis, dashboarding, etc.
+
+Since Hive is primarily designed for batch processing rather than real-time
analytics, you can tell the mismatch of it in this use case.
+
+- **Data writing**: With such a huge data size, the data ingestion time from
Flink into Hive was noticeably long. In addition, Hive only supports data
updating at the granularity of partitions, which is not enough for some cases.
+- **Data analysis**: The Hive-based analytic solution delivers high query
latency, which is a multi-factor issue. Firstly, Hive was slower than expected
when handling large tables with 1 billion rows. Secondly, within Hive, data is
extracted from one layer to another by the execution of Spark SQL, which could
take a while. Thirdly, as Hive needs to work with MySQL to serve all needs from
the application side, data transfer between Hive and MySQL also adds to the
query latency.
+
+**Real-Time Data Analysis Platform**
+
+This is what happens when they add a real-time analytic engine to the picture:
+
+
+
+Compared to the old Hive-based platform, this new one is more efficient in
three ways:
+
+- **Data writing**: Data ingestion into [Apache
Doris](https://doris.apache.org/) is quick and easy, without complicated
configurations and the introduction of extra components. It supports a variety
of data ingestion methods. For example, in this case, data is written from
Kafka into Doris via [Stream
Load](https://doris.apache.org/docs/data-operate/import/import-way/stream-load-manual),
and from Hive into Doris via [Broker
Load](https://doris.apache.org/docs/data-operate/import/import- [...]
+- **Data analysis**: To showcase the query speed of Apache Doris by example,
it can return a 10-million-row result set within seconds in a cross-table join
query. Also, it can work as a [unified query
gateway](https://doris.apache.org/docs/lakehouse/multi-catalog/) with its quick
access to external data (Hive, MySQL, Iceberg, etc.), so analysts don't have to
juggle between multiple components.
+- **Computation and storage costs**: Apache Doris provides the Z-Standard
algorithm that can bring a 3~5 times higher data compression ratio. That's how
it helps reduce costs in data computation and storage. Moreover, the
compression can be done solely in Doris so it won't consume resources from
Flink.
+
+A good real-time analytic solution not only stresses data processing speed, it
also considers all the way along your data pipeline and smoothens every step of
it. Here are two examples:
+
+### 1. The arrangement of CAN data
+
+In Kafka, CAN data was arranged by the dimension of CAN ID. However, for the
sake of data analysis, analysts had to compare signals from various vehicles,
which meant to concatenate data of different CAN ID into a flat table and align
it by timestamp. From that flat table, they could derive different tables for
different analytic purposes. Such transformation was implemented using Spark
SQL, which was time-consuming in the old Hive-based architecture, and the SQL
statements are high-main [...]
+
+In Apache Doris, all they need is to build the tables with the [Aggregate Key
model](https://doris.apache.org/docs/data-table/data-model#aggregate-model),
specify VIN (Vehicle Identification Number) and timestamp as the Aggregate Key,
and define other data fields by `REPLACE_IF_NOT_NULL`. With Doris, they don't
have to take care of the SQL statements or the flat table, but are able to
extract real-time insights from real-time data.
+
+
+
+### 2. DTC data query
+
+Of all CAN data, DTC (Diagnostic Trouble Code) deserves high attention and
separate storage, because it tells you what's wrong with a car. Each day, the
manufacturer receives around 1 billion pieces of DTC. To capture life-saving
information from the DTC, data engineers need to relate the DTC data to a DTC
configuration table in MySQL.
+
+What they used to do was to write the DTC data into Kafka every day, process
it in Flink, and store the results in Hive. In this way, the DTC data and the
DTC configuration table were stored in two different components. That caused a
dilemma: a 1-billion-row DTC table was hard to write into MySQL, while querying
from Hive was slow. As the DTC configuration table was also constantly updated,
engineers could only import a version of it into Hive on a regular basis. That
meant they didn't a [...]
+
+As is mentioned, Apache Doris can work as a unified query gateway. This is
supported by its
[Multi-Catalog](https://doris.apache.org/docs/lakehouse/multi-catalog/)
feature. They import their DTC data from Hive into Doris, and then they create
a MySQL Catalog in Doris to map to the DTC configuration table in MySQL. When
all this is done, they can simply join the two tables within Doris and get
real-time query response.
+
+
+
+## Conclusion
+
+This is an actual real-time analytic solution for IoV. It is designed for data
at really large scale, and it is now supporting a car manufacturer who receives
10 billion rows of new data every day in improving driving safety and
experience.
+
+Building a data platform to suit your use case is not easy, I hope this post
helps you in building your own analytic solution.
+
+
+
+Apache Doris [GitHub repo](https://github.com/apache/doris)
+
+Find Apache Doris makers on
[Slack](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-1t3wfymur-0soNPATWQ~gbU8xutFOLog)
+
diff --git a/static/images/IoV-CAN-data.jpeg b/static/images/IoV-CAN-data.jpeg
new file mode 100644
index 00000000000..412d8af58c3
Binary files /dev/null and b/static/images/IoV-CAN-data.jpeg differ
diff --git a/static/images/IoV-DTC-data-query.png
b/static/images/IoV-DTC-data-query.png
new file mode 100644
index 00000000000..b52ecffca60
Binary files /dev/null and b/static/images/IoV-DTC-data-query.png differ
diff --git a/static/images/IoV-Doris-based-data-warehouse.png
b/static/images/IoV-Doris-based-data-warehouse.png
new file mode 100644
index 00000000000..293975f6890
Binary files /dev/null and b/static/images/IoV-Doris-based-data-warehouse.png
differ
diff --git a/static/images/IoV-Hive-based-data-warehouse.png
b/static/images/IoV-Hive-based-data-warehouse.png
new file mode 100644
index 00000000000..bcf6b13c8c3
Binary files /dev/null and b/static/images/IoV-Hive-based-data-warehouse.png
differ
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]