Excellent, I have been researching related topics recently, especially regarding data quality definitions and the selection of computing engines. If we can implement the content mentioned above, it would be a significant achievement.
Additionally, a more flexible and straightforward installation and deployment process is also very important for the widespread adoption and use of Griffin. Thanks, Zyhao ________________________________ From: William Guo <gu...@apache.org> Sent: Monday, February 19, 2024 16:24 To: dev@griffin.apache.org <dev@griffin.apache.org> Subject: [Discuss] apache griffin curent issues hi all, As we embark on the journey of refactoring Apache Griffin, I'd like to draw attention to some key areas for improvement. These points serve as a foundation for discussion within our development community: - Incomplete and Inflexible Data Quality Definition: The current definition of data quality lacks completeness and flexibility. A comprehensive data quality rule should encompass recording metrics, anomaly detection, and actionable steps. - Rigid Triggering Mechanism: The triggering mechanism for measures exhibits rigidity. Integration with the scheduler in enterprise production environments needs to be seamless and deeply integrated. - Over Reliance on Internal Data Comparison: The measure implementation overly depends on its own data comparison methods, neglecting the optimization capabilities inherent in the engine. There's a need to leverage the engine's optimization features more effectively. We need to focus on data quality benchmarks, rather than optimization queries. - Configurability of Gateway: To enhance flexibility, the gateway between Apache Griffin and the engine should be configurable. This ensures compatibility with popular gateways such as Trino, Kyuubi, etc. - Lack of Default Alert Channels: Currently, there is a deficit in default alert channels. Providing default channels such as Slack, WeChat, etc., is essential to ensure timely communication of alerts. - Absence of Anomaly Detection Module: An anomaly detection module is conspicuously absent. Presently, our thresholds are statically configured, indicating a need for dynamic anomaly detection capabilities. I encourage everyone to share their thoughts and insights on these points within our development list. Your contributions will be invaluable as we work towards enhancing the functionality and usability of Apache Griffin. Thanks, William