chongchongzi opened a new issue #1306: Data quality inspection 
component(数据质量检测组件)
URL: https://github.com/apache/incubator-dolphinscheduler/issues/1306
 
 
   Demand:
   
   Data is an important cornerstone of business development decision-making. 
High quality data is very important for decision-making.
   
   But most of the current data problems have been found by the production 
operation and business personnel and fed back to the technology for 
troubleshooting, which will lead to untimely discovery, time-consuming and 
labor-intensive data inspection, and high labor cost.
   
   Therefore, we hope to achieve a high demand for data monitoring and alarm 
prompt through a tool.
   
   The data quality detection component is proposed to deal with the 
above-mentioned scene problems, timely find the data problems and give alarm 
prompt, so as to realize the automation of data monitoring.
   
   Implementation plan:
   
   1. Workflow: depend on upstream component - > calculation task component - > 
data quality detection component - > calculation task component - > data 
quality detection component
   
   2. Function of data quality detection component: query different SQL 
according to different data sources, check whether the data is null or 
interrupted, check whether the data does not meet the expected interruption, 
check whether it is interrupted beyond the historical comparison threshold, 
send an email alarm, and drop the detection results into the database each time.
   
--------------------------------------------------------------------------------------------
   需求:
   数据是业务发展决策的重要基石,高质量的数据对于决策至关重要。 
   但是当下数据出现问题很多时候都是已经上生产了运营和业务人员发现出来反馈给技术去排查,会造成发现不及时,检查数据耗时耗力,人工成本较高。
    所以希望通过一个工具实现对数据问题的监控以及告警提示有较高的诉求。
    数据质量检测组件,就是为应对上述场景问题而提出,及时发现数据问题进行告警提示,实现数据监控的自动化。
   实现方案:
   1、工作流:依赖上游组件->计算任务组件->数据质量检测组件->计算任务组件->数据质量检测组件…
   
2、数据质量检测组件功能:根据不同数据源查询不一样的sql,检测数据为空是否中断,检测数据不符合预期是否中断,检测超出历史对比阀值是否中断,发送邮件告警,每次检测结果落库。

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to