hi all, Please review our data quality models for DQJob https://cwiki.apache.org/confluence/display/GRIFFIN/models
Thanks, William On Thu, Feb 22, 2024 at 11:10 AM William Guo <gu...@apache.org> wrote: > hi all, > > I have updated the architecture in our wiki. > > https://cwiki.apache.org/confluence/display/GRIFFIN/The+DQ+workflow+Architecture+Proposal > > please have a check and reviews are welcome. > > > > Thanks, > William > > > On Wed, Feb 21, 2024 at 10:43 AM William Guo <gu...@apache.org> wrote: > >> One risk is that our griffin 1.0,0 might not compatible with previous >> versions. >> But we will try to keep the metrics module compatible. >> >> >> >> On Tue, Feb 20, 2024 at 9:20 PM Z Mr <zyhao_co...@outlook.com> wrote: >> >>> Excellent, I have been researching related topics recently, especially >>> regarding data quality definitions and the selection of computing engines. >>> If we can implement the content mentioned above, it would be a significant >>> achievement. >>> >>> Additionally, a more flexible and straightforward installation and >>> deployment process is also very important for the widespread adoption and >>> use of Griffin. >>> >>> Thanks, >>> Zyhao >>> ________________________________ >>> From: William Guo <gu...@apache.org> >>> Sent: Monday, February 19, 2024 16:24 >>> To: dev@griffin.apache.org <dev@griffin.apache.org> >>> Subject: [Discuss] apache griffin curent issues >>> >>> hi all, >>> >>> As we embark on the journey of refactoring Apache Griffin, I'd like to >>> draw >>> attention to some key areas for improvement. These points serve as a >>> foundation for discussion within our development community: >>> >>> - Incomplete and Inflexible Data Quality Definition: The current >>> definition of data quality lacks completeness and flexibility. A >>> comprehensive data quality rule should encompass recording metrics, >>> anomaly >>> detection, and actionable steps. >>> >>> - Rigid Triggering Mechanism: The triggering mechanism for measures >>> exhibits rigidity. Integration with the scheduler in enterprise >>> production >>> environments needs to be seamless and deeply integrated. >>> >>> - Over Reliance on Internal Data Comparison: The measure implementation >>> overly depends on its own data comparison methods, neglecting the >>> optimization capabilities inherent in the engine. There's a need to >>> leverage the engine's optimization features more effectively. We need to >>> focus on data quality benchmarks, rather than optimization queries. >>> >>> - Configurability of Gateway: To enhance flexibility, the gateway >>> between >>> Apache Griffin and the engine should be configurable. This ensures >>> compatibility with popular gateways such as Trino, Kyuubi, etc. >>> >>> - Lack of Default Alert Channels: Currently, there is a deficit in >>> default >>> alert channels. Providing default channels such as Slack, WeChat, etc., >>> is >>> essential to ensure timely communication of alerts. >>> >>> - Absence of Anomaly Detection Module: An anomaly detection module is >>> conspicuously absent. Presently, our thresholds are statically >>> configured, >>> indicating a need for dynamic anomaly detection capabilities. >>> >>> I encourage everyone to share their thoughts and insights on these points >>> within our development list. Your contributions will be invaluable as we >>> work towards enhancing the functionality and usability of Apache Griffin. >>> >>> >>> Thanks, >>> William >>> >>