This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/dolphinscheduler-website.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 2315723 Automated deployment: 78d2919595258f61fa4908ce011ed0dc7360135d
2315723 is described below
commit 23157230f27ec3716d602549d6f5d2c3d250607d
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Wed Nov 24 06:44:11 2021 +0000
Automated deployment: 78d2919595258f61fa4908ce011ed0dc7360135d
---
build/blog.8b1c191.js | 1 +
build/blog.d3cb641.js | 1 -
en-us/blog/Lizhi-case-study(En).html | 136 +++++++++++++++++++++++++++++++++++
en-us/blog/Lizhi-case-study(En).json | 6 ++
en-us/blog/index.html | 4 +-
zh-cn/blog/Lizhi-case-study.html | 3 +-
zh-cn/blog/Lizhi-case-study.json | 2 +-
zh-cn/blog/index.html | 2 +-
8 files changed, 149 insertions(+), 6 deletions(-)
diff --git a/build/blog.8b1c191.js b/build/blog.8b1c191.js
new file mode 100644
index 0000000..20a2da9
--- /dev/null
+++ b/build/blog.8b1c191.js
@@ -0,0 +1 @@
+webpackJsonp([2],{1:function(e,t){e.exports=React},2:function(e,t){e.exports=ReactDOM},400:function(e,t,n){e.exports=n(401)},401:function(e,t,n){"use
strict";function r(e){return e&&e.__esModule?e:{default:e}}function
l(e,t){if(!(e instanceof t))throw new TypeError("Cannot call a class as a
function")}function a(e,t){if(!e)throw new ReferenceError("this hasn't been
initialised - super() hasn't been called");return!t||"object"!=typeof
t&&"function"!=typeof t?e:t}function i(e,t){if("functi [...]
\ No newline at end of file
diff --git a/build/blog.d3cb641.js b/build/blog.d3cb641.js
deleted file mode 100644
index 5936e72..0000000
--- a/build/blog.d3cb641.js
+++ /dev/null
@@ -1 +0,0 @@
-webpackJsonp([2],{1:function(e,t){e.exports=React},2:function(e,t){e.exports=ReactDOM},400:function(e,t,n){e.exports=n(401)},401:function(e,t,n){"use
strict";function r(e){return e&&e.__esModule?e:{default:e}}function
l(e,t){if(!(e instanceof t))throw new TypeError("Cannot call a class as a
function")}function a(e,t){if(!e)throw new ReferenceError("this hasn't been
initialised - super() hasn't been called");return!t||"object"!=typeof
t&&"function"!=typeof t?e:t}function o(e,t){if("functi [...]
\ No newline at end of file
diff --git a/en-us/blog/Lizhi-case-study(En).html
b/en-us/blog/Lizhi-case-study(En).html
new file mode 100644
index 0000000..d4df893
--- /dev/null
+++ b/en-us/blog/Lizhi-case-study(En).html
@@ -0,0 +1,136 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+ <meta charset="UTF-8">
+ <meta name="viewport" content="width=device-width, initial-scale=1.0,
maximum-scale=1.0, user-scalable=no">
+ <meta name="keywords" content="Lizhi-case-study(En)">
+ <meta name="description" content="Lizhi-case-study(En)">
+ <title>Lizhi-case-study(En)</title>
+ <link rel="shortcut icon" href="/img/favicon.ico">
+ <link rel="stylesheet" href="/build/vendor.e328afe.css">
+ <link rel="stylesheet" href="/build/blog.md.fd8b187.css">
+</head>
+<body>
+ <div id="root"><div class="blog-detail-page" data-reactroot=""><header
class="header-container header-container-dark"><div class="header-body"><a
href="/en-us/index.html"><img class="logo" src="/img/hlogo_white.svg"/></a><div
class="search search-dark"><span class="icon-search"></span></div><span
class="language-switch language-switch-dark">中</span><div
class="header-menu"><img class="header-menu-toggle"
src="/img/system/menu_white.png"/><div><ul class="ant-menu whiteClass
ant-menu-lig [...]
+<div align=center>
+<img src="https://imgpp.com/images/2021/11/23/1637566412753.md.png"/>
+</div>
+<blockquote>
+<p>Editor's word: The online audio industry is a blue ocean market in China
nowadays. According to CIC data, the market size of China’s online audio
industry has grown from 1.6 billion yuan in 2016 to 13.1 billion yuan in 2020,
with a compound annual growth rate of 69.4%. With the popularity of the
Internet of Things, audio has permeated into various terminals mobiles,
vehicles, smart hardware, home equipment, and other various scenarios,
sequentially to maximize the accompanying advanta [...]
+</blockquote>
+<blockquote>
+<p>In recent years, domestic audio communities have successfully been listed
one after another. Among them, Lizhi was listed on NASDAQ in 2020 as the first
“online audio” stock, which has more than 200 million users at present. In the
information age, the audio industry also faces the generation of massive UGC
data on the platform. On one hand, the users and consumers of audio content
expect high-efficiency information transmission; on the other hand, the
internet audio platforms hope th [...]
+</blockquote>
+<h2>Background</h2>
+<div align=center>
+<img src="https://imgpp.com/images/2021/11/23/radio-g360707f44_1920.md.jpg"/>
+</div>
+<p>Lizhi is a fast-growing UGC audio community company that attaches great
importance to AI and data analysis technology development. AI can find the
right voice for each user among the massively fragmented audios, and build it
into a sustainable ecological closed loop. And data analysis can guide the
company’s fast-growing business. Both of the two fields need to process massive
amounts of data and require a big data scheduling system.</p>
+<p>Before the beginning of 2020, Lizhi used the Azkaban scheduling system.
Although the big data scheduling SQL/shell/python scripts and other big
data-related modules can complete the entire AI process, it is not easy and
reusable enough. The machine learning platform is a set of systems built
specifically for AI, which abstracts the AI development paradigm, i.e. data
acquisition, data preprocessing, model training, model prediction, model
evaluation, and model release into modules. Eac [...]
+<h2>Challenges During Machine Learning Platform Development</h2>
+<p>During the machine learning platform development, Lizhi has some clear
requirements for the scheduling system:</p>
+<ol>
+<li>
+<p>It should be able to store and calculate massive data, such as screening
samples, generating portraits, feature preprocessing, distributed model
training, etc.;</p>
+</li>
+<li>
+<p>The DAG execution engine is required, and the processes such as data
acquisition -> data preprocessing -> model training -> model
prediction -> model evaluation -> model release should be executed in
series with DAG.</p>
+</li>
+</ol>
+<p>In the development by Azkaban, the team encountered some challenges:</p>
+<p>Challenge 1: The development model is cumbersome, which requires the user
to package scripts and build DAG dependencies, and there is no implementation
of DAG drag and drop;</p>
+<p>Challenge 2: The modules are not rich enough, and the scripted AI modules
are not universal enough, thus the team need to develop modules repeatedly,
which is unnecessary and error-prone;</p>
+<p>Challenge 3: Stand-alone deployment, the system is unstable and prone to
failures. Besides, Task jams can easily cause downstream tasks to fail.</p>
+<h2>Turn to DolphinScheduler</h2>
+<p>After stepping on numerous pits in the old system, the Lizhi machine
learning team, including recommendation system engineers Haibin Yu, Yanbin Lin,
Huanjie Xie, and Yifei Guo, decided to adopt DolphinScheduler.</p>
+<p>Currently, 1,600+ processes and 12,000+ tasks (IDC) are running smoothly on
DolphinScheduler every day.</p>
+<p>Haibin Yu said that the majority of the users of the scheduling system are
recommendation algorithms engineers> data analysts> risk control
algorithms engineers> business developers (importance decreased in order).
Not all of them are masters of data management operations, and DolphinScheduler
perfectly meets their needs for a simple, easy-to-use, drag-and-drop scheduling
system:</p>
+<ol>
+<li>
+<p>Distributed decentralized architecture and fault tolerance mechanism to
ensure the high availability of the system;</p>
+</li>
+<li>
+<p>Visual DAG drag-and-drop UI, easy to use, and iterates quickly;</p>
+</li>
+<li>
+<p>It supports various modules, and can simply develop and integrate its
modules;</p>
+</li>
+<li>
+<p>The community is active, hence there are no worries about the project
supports;</p>
+</li>
+<li>
+<p>It is very close to the operating mode of the machine learning platform,
using DAG to drag and drop UI programming.</p>
+</li>
+</ol>
+<h2>Use Case</h2>
+<p>After selecting the DolphinScheduler, the Lizhi machine learning platform
carries out re-development based on it and applies the achievements to actual
business scenarios, which are mainly about recommendation and risk control.
Recommendation scenarios cover recommendation of voice, anchor, live broadcast,
podcast, friend, etc., and risk control scenarios cover risk control in
payment, advertising, and comment, etc.
+At the technical level of the platform, Lizhi optimizes the extended modules
for the five paradigms of machine learning, i.e. obtaining training samples,
data preprocessing, model training, model evaluation, and model release.</p>
+<p>A simple xgboost case:</p>
+<div align=center>
+<img src="https://imgpp.com/images/2021/11/23/1.md.png"/>
+</div>
+<h3>1. Obtaining training samples</h3>
+<p>At present, Lizhi does not directly select data from Hive, and joins the
union, splitting the sample afterward, but directly processes the sample by
shell nodes.</p>
+<div align=center>
+<img src="https://imgpp.com/images/2021/11/23/2.md.png"/>
+</div>
+<h3>2. Data preprocessing</h3>
+<p>Transformer& custom preprocessing configuration file, use the same
configuration for online training, and feature preprocessing is performed after
the feature is obtained. It contains the itemType and its feature set to be
predicted, the user’s userType and its feature set, as well as the associated
and crossed itemType and its feature set. Define the transformer function for
each feature preprocessing, supports custom transformer and hot update,
xgboost, and tf model feature prep [...]
+<div align=center>
+<img src="https://imgpp.com/images/2021/11/23/2.md.png"/>
+</div>
+<h3>3. Xgboost training</h3>
+<p>It supports w2v, xgboost, tf model training modules. The training modules
are first packaged with TensorFlow or PyTorch and then packaged into
DolphinScheduler modules.
+For example, in the xgboost training process, use Python to package the
xgboost training script into the xgboost training node of DolphinScheduler, and
show the parameters required for training on the interface. The file exported
by “training set data preprocessing” is input to the training node through
HDFS.</p>
+<div align=center>
+<img src="https://imgpp.com/images/2021/11/23/3.md.png"/>
+</div>
+<h3>4. Model release</h3>
+<p>The release model will send the model and preprocessing configuration files
to HDFS and insert records into the model release table. The model service will
automatically identify the new model, update the model, and provide online
prediction services to the external.</p>
+<div align=center>
+<img src="https://imgpp.com/images/2021/11/23/4.md.png"/>
+</div>
+<p>Haibin Yu said that due to historical and technical limitations, Lizhi has
not yet built a machine learning platform like Ali PAI, but the practice has
proved that similar platform functions can be achieved based on
DolphinScheduler.</p>
+<p>In addition, Lizhi has also carried out many re-developments based on
DolphinScheduler to make the scheduling system more in line with actual
business needs, such as:</p>
+<ol>
+<li>
+<p>Pop-up the window of whether to set timing when defining the workflow</p>
+</li>
+<li>
+<p>Add display pages for all workflow definitions to facilitate searching</p>
+</li>
+</ol>
+<p>a) Add the workflow definition filter and jump to the workflow instance
page, and use a line chart to show the change of its running time
+b) The workflow instance continues to dive to the task instance</p>
+<ol start="3">
+<li>Enter parameters during runtime to configure the disabled task nodes</li>
+</ol>
+<h2>Machine Learning Platform based on Scheduling System May Lead the Future
Trend</h2>
+<p>Deep learning is a leading trend in the future. Lizhi has developed new
modules for deep learning models. The entire tf process has been completed yet,
and LR and GBDT model-related modules are also in the plan. The latter two deep
learning models are relatively more simple, easier to get started, faster to
iterate, and can be used in generally recommended scenarios. After
implementation, the Lizhi machine learning platform can be more complete.
+Lizhi believes that if the scheduling system can be improved in terms of
kernel stability, drag-and-drop UI support, convenient modules' expansion, task
plug-in, and task parameter transfer, building the machine learning platform
based on the scheduling system may become a common practice in the industry.</p>
+</section><footer class="footer-container"><div
class="footer-body"><div><h3>About us</h3><h4>Do you need feedback? Please
contact us through the following ways.</h4></div><div
class="contact-container"><ul><li><img class="img-base"
src="/img/emailgray.png"/><img class="img-change" src="/img/emailblue.png"/><a
href="/en-us/community/development/subscribe.html"><p>Email
List</p></a></li><li><img class="img-base" src="/img/twittergray.png"/><img
class="img-change" src="/img/twitterblue.png [...]
+ <script
src="//cdn.jsdelivr.net/npm/[email protected]/dist/react-with-addons.min.js"></script>
+ <script
src="//cdn.jsdelivr.net/npm/[email protected]/dist/react-dom.min.js"></script>
+ <script>window.rootPath = '';</script>
+ <script src="/build/vendor.acaf0ce.js"></script>
+ <script src="/build/blog.md.6f0aa52.js"></script>
+ <script>
+ var _hmt = _hmt || [];
+ (function() {
+ var hm = document.createElement("script");
+ hm.src = "https://hm.baidu.com/hm.js?4e7b4b400dd31fa015018a435c64d06f";
+ var s = document.getElementsByTagName("script")[0];
+ s.parentNode.insertBefore(hm, s);
+ })();
+ </script>
+ <!-- Global site tag (gtag.js) - Google Analytics -->
+ <script async
src="https://www.googletagmanager.com/gtag/js?id=G-899J8PYKJZ"></script>
+ <script>
+ window.dataLayer = window.dataLayer || [];
+ function gtag(){dataLayer.push(arguments);}
+ gtag('js', new Date());
+
+ gtag('config', 'G-899J8PYKJZ');
+ </script>
+</body>
+</html>
\ No newline at end of file
diff --git a/en-us/blog/Lizhi-case-study(En).json
b/en-us/blog/Lizhi-case-study(En).json
new file mode 100644
index 0000000..0100c51
--- /dev/null
+++ b/en-us/blog/Lizhi-case-study(En).json
@@ -0,0 +1,6 @@
+{
+ "filename": "Lizhi-case-study(En).md",
+ "__html": "<h1>A Formidable Combination of Lizhi Machine Learning
Platform& DolphinScheduler Creates New Paradigm for Data Process in the
Future!</h1>\n<div align=center>\n<img
src=\"https://imgpp.com/images/2021/11/23/1637566412753.md.png\"/>\n</div>\n<blockquote>\n<p>Editor's
word: The online audio industry is a blue ocean market in China nowadays.
According to CIC data, the market size of China’s online audio industry has
grown from 1.6 billion yuan in 2016 to 13.1 billion yuan [...]
+ "link": "/dist/en-us/blog/Lizhi-case-study(En).html",
+ "meta": {}
+}
\ No newline at end of file
diff --git a/en-us/blog/index.html b/en-us/blog/index.html
index 547eaf1..b53815a 100644
--- a/en-us/blog/index.html
+++ b/en-us/blog/index.html
@@ -11,12 +11,12 @@
<link rel="stylesheet" href="/build/blog.acc2955.css">
</head>
<body>
- <div id="root"><div class="blog-list-page" data-reactroot=""><header
class="header-container header-container-dark"><div class="header-body"><a
href="/en-us/index.html"><img class="logo" src="/img/hlogo_white.svg"/></a><div
class="search search-dark"><span class="icon-search"></span></div><span
class="language-switch language-switch-dark">中</span><div
class="header-menu"><img class="header-menu-toggle"
src="/img/system/menu_white.png"/><div><ul class="ant-menu whiteClass
ant-menu-light [...]
+ <div id="root"><div class="blog-list-page" data-reactroot=""><header
class="header-container header-container-dark"><div class="header-body"><a
href="/en-us/index.html"><img class="logo" src="/img/hlogo_white.svg"/></a><div
class="search search-dark"><span class="icon-search"></span></div><span
class="language-switch language-switch-dark">中</span><div
class="header-menu"><img class="header-menu-toggle"
src="/img/system/menu_white.png"/><div><ul class="ant-menu whiteClass
ant-menu-light [...]
<script
src="//cdn.jsdelivr.net/npm/[email protected]/dist/react-with-addons.min.js"></script>
<script
src="//cdn.jsdelivr.net/npm/[email protected]/dist/react-dom.min.js"></script>
<script>window.rootPath = '';</script>
<script src="/build/vendor.acaf0ce.js"></script>
- <script src="/build/blog.d3cb641.js"></script>
+ <script src="/build/blog.8b1c191.js"></script>
<script>
var _hmt = _hmt || [];
(function() {
diff --git a/zh-cn/blog/Lizhi-case-study.html b/zh-cn/blog/Lizhi-case-study.html
index 08914c6..5cb20cd 100644
--- a/zh-cn/blog/Lizhi-case-study.html
+++ b/zh-cn/blog/Lizhi-case-study.html
@@ -11,7 +11,8 @@
<link rel="stylesheet" href="/build/blog.md.fd8b187.css">
</head>
<body>
- <div id="root"><div class="blog-detail-page" data-reactroot=""><header
class="header-container header-container-dark"><div class="header-body"><a
href="/zh-cn/index.html"><img class="logo" src="/img/hlogo_white.svg"/></a><div
class="search search-dark"><span class="icon-search"></span></div><span
class="language-switch language-switch-dark">En</span><div
class="header-menu"><img class="header-menu-toggle"
src="/img/system/menu_white.png"/><div><ul class="ant-menu whiteClass
ant-menu-li [...]
+ <div id="root"><div class="blog-detail-page" data-reactroot=""><header
class="header-container header-container-dark"><div class="header-body"><a
href="/zh-cn/index.html"><img class="logo" src="/img/hlogo_white.svg"/></a><div
class="search search-dark"><span class="icon-search"></span></div><span
class="language-switch language-switch-dark">En</span><div
class="header-menu"><img class="header-menu-toggle"
src="/img/system/menu_white.png"/><div><ul class="ant-menu whiteClass
ant-menu-li [...]
+<div align=center>
<img src="https://imgpp.com/images/2021/11/23/1637566412753.md.png"/>
</div>
<blockquote>
diff --git a/zh-cn/blog/Lizhi-case-study.json b/zh-cn/blog/Lizhi-case-study.json
index 7da19c2..20edc7b 100644
--- a/zh-cn/blog/Lizhi-case-study.json
+++ b/zh-cn/blog/Lizhi-case-study.json
@@ -1,6 +1,6 @@
{
"filename": "Lizhi-case-study.md",
- "__html": "<div align=center>\n<img
src=\"https://imgpp.com/images/2021/11/23/1637566412753.md.png\"/>\n</div>\n<blockquote>\n<p>编
者 按:在线音频行业在中国仍是蓝海一片。根据 CIC 数据显示,中国在线音频行业市场规模由 2016 年的 16 亿元增长至 2020 年的 131
亿元,复合年增长率为
69.4%。随着物联网场景的普及,音频场景更是从移动端扩展至车载端、智能硬件端、家居端等各类场景,从而最大限度发挥音频载体的伴随性优势。</p>\n</blockquote>\n<blockquote>\n<p>近年来,国内音频社区成功上市的案例接踵而至。其中,于
2020 年在纳斯达克上市的“在线音频”第一股荔枝,用户数早已超过 2 亿。进入信息化时代,音频行业和所有行业一样,面对平台海量 UGC
数据的产生,音频用户和消费者对于信息传递的效率也有着高要求,互联网音频平台也希望信息能够精准、快速地推送给用户,
同时保证平台 UGC 内容的安 [...]
+ "__html": "<h1>荔枝机器学习平台与大数据调度系统“双剑合璧”,打造未来数据处理新模式!</h1>\n<div
align=center>\n<img
src=\"https://imgpp.com/images/2021/11/23/1637566412753.md.png\"/>\n</div>\n<blockquote>\n<p>编
者 按:在线音频行业在中国仍是蓝海一片。根据 CIC 数据显示,中国在线音频行业市场规模由 2016 年的 16 亿元增长至 2020 年的 131
亿元,复合年增长率为
69.4%。随着物联网场景的普及,音频场景更是从移动端扩展至车载端、智能硬件端、家居端等各类场景,从而最大限度发挥音频载体的伴随性优势。</p>\n</blockquote>\n<blockquote>\n<p>近年来,国内音频社区成功上市的案例接踵而至。其中,于
2020 年在纳斯达克上市的“在线音频”第一股荔枝,用户数早已超过 2 亿。进入信息化时代,音频行业和所有行业一样,面对平台海量 UGC
数据的产生,音频用户和消费者对于信�
��传递的效率也有 [...]
"link": "/dist/zh-cn/blog/Lizhi-case-study.html",
"meta": {}
}
\ No newline at end of file
diff --git a/zh-cn/blog/index.html b/zh-cn/blog/index.html
index 52953c4..faa8e41 100644
--- a/zh-cn/blog/index.html
+++ b/zh-cn/blog/index.html
@@ -16,7 +16,7 @@
<script
src="//cdn.jsdelivr.net/npm/[email protected]/dist/react-dom.min.js"></script>
<script>window.rootPath = '';</script>
<script src="/build/vendor.acaf0ce.js"></script>
- <script src="/build/blog.d3cb641.js"></script>
+ <script src="/build/blog.8b1c191.js"></script>
<script>
var _hmt = _hmt || [];
(function() {