hi all, I have updated the architecture in our wiki. https://cwiki.apache.org/confluence/display/GRIFFIN/The+DQ+workflow+Architecture+Proposal
please have a check and reviews are welcome. Thanks, William On Wed, Feb 21, 2024 at 10:43 AM William Guo <gu...@apache.org> wrote: > One risk is that our griffin 1.0,0 might not compatible with previous > versions. > But we will try to keep the metrics module compatible. > > > > On Tue, Feb 20, 2024 at 9:20 PM Z Mr <zyhao_co...@outlook.com> wrote: > >> Excellent, I have been researching related topics recently, especially >> regarding data quality definitions and the selection of computing engines. >> If we can implement the content mentioned above, it would be a significant >> achievement. >> >> Additionally, a more flexible and straightforward installation and >> deployment process is also very important for the widespread adoption and >> use of Griffin. >> >> Thanks, >> Zyhao >> ________________________________ >> From: William Guo <gu...@apache.org> >> Sent: Monday, February 19, 2024 16:24 >> To: dev@griffin.apache.org <dev@griffin.apache.org> >> Subject: [Discuss] apache griffin curent issues >> >> hi all, >> >> As we embark on the journey of refactoring Apache Griffin, I'd like to >> draw >> attention to some key areas for improvement. These points serve as a >> foundation for discussion within our development community: >> >> - Incomplete and Inflexible Data Quality Definition: The current >> definition of data quality lacks completeness and flexibility. A >> comprehensive data quality rule should encompass recording metrics, >> anomaly >> detection, and actionable steps. >> >> - Rigid Triggering Mechanism: The triggering mechanism for measures >> exhibits rigidity. Integration with the scheduler in enterprise production >> environments needs to be seamless and deeply integrated. >> >> - Over Reliance on Internal Data Comparison: The measure implementation >> overly depends on its own data comparison methods, neglecting the >> optimization capabilities inherent in the engine. There's a need to >> leverage the engine's optimization features more effectively. We need to >> focus on data quality benchmarks, rather than optimization queries. >> >> - Configurability of Gateway: To enhance flexibility, the gateway between >> Apache Griffin and the engine should be configurable. This ensures >> compatibility with popular gateways such as Trino, Kyuubi, etc. >> >> - Lack of Default Alert Channels: Currently, there is a deficit in >> default >> alert channels. Providing default channels such as Slack, WeChat, etc., is >> essential to ensure timely communication of alerts. >> >> - Absence of Anomaly Detection Module: An anomaly detection module is >> conspicuously absent. Presently, our thresholds are statically configured, >> indicating a need for dynamic anomaly detection capabilities. >> >> I encourage everyone to share their thoughts and insights on these points >> within our development list. Your contributions will be invaluable as we >> work towards enhancing the functionality and usability of Apache Griffin. >> >> >> Thanks, >> William >> >