hi all, I have drafted dq job layer model in the following wiki, https://cwiki.apache.org/confluence/display/GRIFFIN/DQ+Job+models
Please have a review. Thanks, William On Sat, Feb 24, 2024 at 7:29 AM William Guo <gu...@apache.org> wrote: > hi all, > > Please review our data quality models for DQJob > https://cwiki.apache.org/confluence/display/GRIFFIN/models > > Thanks, > William > > On Thu, Feb 22, 2024 at 11:10 AM William Guo <gu...@apache.org> wrote: > >> hi all, >> >> I have updated the architecture in our wiki. >> >> https://cwiki.apache.org/confluence/display/GRIFFIN/The+DQ+workflow+Architecture+Proposal >> >> please have a check and reviews are welcome. >> >> >> >> Thanks, >> William >> >> >> On Wed, Feb 21, 2024 at 10:43 AM William Guo <gu...@apache.org> wrote: >> >>> One risk is that our griffin 1.0,0 might not compatible with previous >>> versions. >>> But we will try to keep the metrics module compatible. >>> >>> >>> >>> On Tue, Feb 20, 2024 at 9:20 PM Z Mr <zyhao_co...@outlook.com> wrote: >>> >>>> Excellent, I have been researching related topics recently, especially >>>> regarding data quality definitions and the selection of computing engines. >>>> If we can implement the content mentioned above, it would be a significant >>>> achievement. >>>> >>>> Additionally, a more flexible and straightforward installation and >>>> deployment process is also very important for the widespread adoption and >>>> use of Griffin. >>>> >>>> Thanks, >>>> Zyhao >>>> ________________________________ >>>> From: William Guo <gu...@apache.org> >>>> Sent: Monday, February 19, 2024 16:24 >>>> To: dev@griffin.apache.org <dev@griffin.apache.org> >>>> Subject: [Discuss] apache griffin curent issues >>>> >>>> hi all, >>>> >>>> As we embark on the journey of refactoring Apache Griffin, I'd like to >>>> draw >>>> attention to some key areas for improvement. These points serve as a >>>> foundation for discussion within our development community: >>>> >>>> - Incomplete and Inflexible Data Quality Definition: The current >>>> definition of data quality lacks completeness and flexibility. A >>>> comprehensive data quality rule should encompass recording metrics, >>>> anomaly >>>> detection, and actionable steps. >>>> >>>> - Rigid Triggering Mechanism: The triggering mechanism for measures >>>> exhibits rigidity. Integration with the scheduler in enterprise >>>> production >>>> environments needs to be seamless and deeply integrated. >>>> >>>> - Over Reliance on Internal Data Comparison: The measure implementation >>>> overly depends on its own data comparison methods, neglecting the >>>> optimization capabilities inherent in the engine. There's a need to >>>> leverage the engine's optimization features more effectively. We need to >>>> focus on data quality benchmarks, rather than optimization queries. >>>> >>>> - Configurability of Gateway: To enhance flexibility, the gateway >>>> between >>>> Apache Griffin and the engine should be configurable. This ensures >>>> compatibility with popular gateways such as Trino, Kyuubi, etc. >>>> >>>> - Lack of Default Alert Channels: Currently, there is a deficit in >>>> default >>>> alert channels. Providing default channels such as Slack, WeChat, etc., >>>> is >>>> essential to ensure timely communication of alerts. >>>> >>>> - Absence of Anomaly Detection Module: An anomaly detection module is >>>> conspicuously absent. Presently, our thresholds are statically >>>> configured, >>>> indicating a need for dynamic anomaly detection capabilities. >>>> >>>> I encourage everyone to share their thoughts and insights on these >>>> points >>>> within our development list. Your contributions will be invaluable as we >>>> work towards enhancing the functionality and usability of Apache >>>> Griffin. >>>> >>>> >>>> Thanks, >>>> William >>>> >>>