This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/dolphinscheduler-website.git
The following commit(s) were added to refs/heads/asf-site by this push:
new c274fb9 Automated deployment: 694d48a6425b8e33866d011a98fbb0d16a6dc898
c274fb9 is described below
commit c274fb9eb1e14458f8186fb7bec72820fd9b2e4f
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Thu Dec 30 08:02:24 2021 +0000
Automated deployment: 694d48a6425b8e33866d011a98fbb0d16a6dc898
---
en-us/blog/Eavy_Info.html | 2 +-
en-us/blog/Eavy_Info.json | 2 +-
zh-cn/blog/Eavy_Info.html | 8 ++++----
zh-cn/blog/Eavy_Info.json | 2 +-
4 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/en-us/blog/Eavy_Info.html b/en-us/blog/Eavy_Info.html
index 243c58b..b8dc7e9 100644
--- a/en-us/blog/Eavy_Info.html
+++ b/en-us/blog/Eavy_Info.html
@@ -47,7 +47,7 @@
<p>We gave up the previous UI front-end of DS after comprehensive
consideration and reused the DS back-end interfaces to carry the online
procedure, start and stopping, deleting, and log viewing.</p>
<p>The design of the entire synchronization module is aimed to reuse the
diversity of input and output plugins of the Datax component and integrate with
the optimization of DS to achieve an offline synchronization task. This is a
component diagram of our current synchronization.</p>
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/29/9febb92b69077778c765d115427e3a48.md.png"/>
+<img
src="https://imgpp.com/images/2021/12/30/ffd0c839647bcce4c208ee0cf5b7622b.md.png"/>
</div>
<h2>Self Development Practices Based on DS</h2>
<p>Anyone familiar with Datax knows that it is essentially an ETL tool, which
provides a transformer module that supports Groovy syntax, and at the same time
further enrich the tool classes used in the transformer in the Datax source
code, such as replacing, regular matching, screening, desensitization,
statistics, and other functions. That shows its property of Transform. Since
the tasks are implemented with DAG diagrams in Apache DolphinScheduler, we
wonder that is it possible to abstr [...]
diff --git a/en-us/blog/Eavy_Info.json b/en-us/blog/Eavy_Info.json
index fc8ba70..80e7324 100644
--- a/en-us/blog/Eavy_Info.json
+++ b/en-us/blog/Eavy_Info.json
@@ -1,6 +1,6 @@
{
"filename": "Eavy_Info.md",
- "__html": "<h1>Eavy Info Builds Data Asset Management Platform Services
Based on Apache DolphinScheduler to Construct Government Information Ecology |
Use Case</h1>\n<div align=center>\n<img
src=\"https://imgpp.com/images/2021/12/29/1640759432737.md.png\"/>\n</div>\n<p>Based
on the Apache DolphinScheduler, the cloud computing and big data provider Eavy
Info has been serving the business operations in the company for more than a
year.</p>\n<p>Combining with the government affairs inform [...]
+ "__html": "<h1>Eavy Info Builds Data Asset Management Platform Services
Based on Apache DolphinScheduler to Construct Government Information Ecology |
Use Case</h1>\n<div align=center>\n<img
src=\"https://imgpp.com/images/2021/12/29/1640759432737.md.png\"/>\n</div>\n<p>Based
on the Apache DolphinScheduler, the cloud computing and big data provider Eavy
Info has been serving the business operations in the company for more than a
year.</p>\n<p>Combining with the government affairs inform [...]
"link": "/dist/en-us/blog/Eavy_Info.html",
"meta": {
"title": "Eavy Info Builds Data Asset Management Platform Services Based
on Apache DolphinScheduler to Construct Government Information Ecology",
diff --git a/zh-cn/blog/Eavy_Info.html b/zh-cn/blog/Eavy_Info.html
index 16dee05..927c8a6 100644
--- a/zh-cn/blog/Eavy_Info.html
+++ b/zh-cn/blog/Eavy_Info.html
@@ -46,19 +46,19 @@ DolphinScheduler 是一个分布式去中心化,易扩展的可视化 DAG 调
</div>
我们这里综合考虑放弃了之前 DS 的 UI 前端(第二部分在自助开发模块会给大家解释),复用 DS 后端的上线、启停、删除、日志查看等接口。
<div align=center>
-< img src="https://imgpp.com/images/2021/12/30/4.md.png"/>
+<img src="https://imgpp.com/images/2021/12/30/4.md.png"/>
</div>
<div align=center>
-< img src="https://imgpp.com/images/2021/12/30/5.md.png"/>
+<img src="https://imgpp.com/images/2021/12/30/5.md.png"/>
</div>
<p>整个同步模块的设计思路,就是重复利用 datax 组件的输入输出 plugin 多样性,配合 DS
的优化,来实现一个离线的同步任务,这个是当前我们的同步的一个组件图,实时同步这块不再赘述。</p>
<div align=center>
-< img src="https://imgpp.com/images/2021/12/30/9.md.png"/>
+<img src="https://imgpp.com/images/2021/12/30/9.md.png"/>
</div>
<h2>03 基于DS的自助开发实践</h2>
<p>熟悉 datax的人都知道它本质上是一个 ETL 工具,而其 Transform 的属性体现在,它提供了一个支持 grovy 语法的
transformer 模块,同时可以在 datax 源码中进一步丰富 transformer 中用到工具类,例如替换、正则匹配、筛选、脱敏、统计等功能。而
Dolphinscheduler 的任务,是可以用 DAG 图来实现,那么我们想到,是否存在一种可能,针对一张表或者几张表,把每个 datax 或者 SQL
抽象成一个数据治理的小模块,每个模块按照 DAG 图去设计,并且在上下游之间可以实现数据的传递,最好还是和 DS 一样的可以拖拽式的实现。于是,我们基于前期对
datax 与 ds 的使用,实现了一个自助开发的模块。</p>
<div align=center>
-< img src="https://imgpp.com/images/2021/12/30/6.md.png"/>
+<img src="https://imgpp.com/images/2021/12/30/6.md.png"/>
</div>
<p>每个组件可能是一个模块,每个模块功能之间的依赖关系,我们利用 ds 的depend
来处理,而对应组件与组件传递数据,我们利用前端去存储,也就是我们在引入
input(输入组件)之后,让前端来进行大部分组件间的传递和逻辑判断,因为每个组件都可以看作一个 datax
的(输出/输出),所有参数在输入时,最终输出的全集基本就确定了,这也是我们放弃 DS 的 UI 前端的原因。之后,我们将这个 DAG 图组装成 DS
的定义的类型,同样交付给 ds 任务中心。</p>
<p>PS:因为我们的业务场景可能存在跨数据库查询的情况(不同实例的 mysql 组合查询),我们的 SQL 组件底层使用 Presto 来实现一个统一
SQL 层,这样即使是不同 IP 实例下的数据源(业务上有关联意义),也可以通过 Presto 来支持组合检索。</p>
diff --git a/zh-cn/blog/Eavy_Info.json b/zh-cn/blog/Eavy_Info.json
index 68c027e..dbc9b1a 100644
--- a/zh-cn/blog/Eavy_Info.json
+++ b/zh-cn/blog/Eavy_Info.json
@@ -1,6 +1,6 @@
{
"filename": "Eavy_Info.md",
- "__html": "<h1>亿云基于 DolphinScheduler 构建资产数据管理平台服务,助力政务信息化生态建设 |
最佳实践</h1>\n<div align=center>\n<img
src=\"https://imgpp.com/images/2021/12/30/1639640547411.md.png\"/>\n</div>\n作者|
孙浩\n<p>✎ 编 者 按:基于 Apache Dolphinscheduler
调度平台,云计算和大数据提供商亿云信息已经服务公司多个项目部的地市现场平稳运行一年之久。\n结合政务信息化生态建设业务,亿云信息基于
DolphinScheduler 构建了资产数据管控平台的数据服务模块。他们是如何进行探索和优化的?亿云信息研发工程师 孙浩
进行了详细的用户实践交流分享。</p>\n<h2>01 研发背景</h2>\n<p>亿云主要的业务主要是 ToG
的业务,而业务前置的主要工作,在于数据的采集和共享,传统 ETL 工具,例如 kettle 等工具对于一线的实施人员的来说上手难度还是有的,再就是类似
kettl [...]
+ "__html": "<h1>亿云基于 DolphinScheduler 构建资产数据管理平台服务,助力政务信息化生态建设 |
最佳实践</h1>\n<div align=center>\n<img
src=\"https://imgpp.com/images/2021/12/30/1639640547411.md.png\"/>\n</div>\n作者|
孙浩\n<p>✎ 编 者 按:基于 Apache Dolphinscheduler
调度平台,云计算和大数据提供商亿云信息已经服务公司多个项目部的地市现场平稳运行一年之久。\n结合政务信息化生态建设业务,亿云信息基于
DolphinScheduler 构建了资产数据管控平台的数据服务模块。他们是如何进行探索和优化的?亿云信息研发工程师 孙浩
进行了详细的用户实践交流分享。</p>\n<h2>01 研发背景</h2>\n<p>亿云主要的业务主要是 ToG
的业务,而业务前置的主要工作,在于数据的采集和共享,传统 ETL 工具,例如 kettle 等工具对于一线的实施人员的来说上手难度还是有的,再就是类似
kettl [...]
"link": "/dist/zh-cn/blog/Eavy_Info.html",
"meta": {}
}
\ No newline at end of file