[Discuss] apache griffin curent issues

William Guo Mon, 19 Feb 2024 08:24:26 -0800

hi all,

As we embark on the journey of refactoring Apache Griffin, I'd like to draw
attention to some key areas for improvement. These points serve as a
foundation for discussion within our development community:


 - Incomplete and Inflexible Data Quality Definition: The current
definition of data quality lacks completeness and flexibility. A
comprehensive data quality rule should encompass recording metrics, anomaly
detection, and actionable steps.

 - Rigid Triggering Mechanism: The triggering mechanism for measures
exhibits rigidity. Integration with the scheduler in enterprise production
environments needs to be seamless and deeply integrated.

 - Over Reliance on Internal Data Comparison: The measure implementation
overly depends on its own data comparison methods, neglecting the
optimization capabilities inherent in the engine. There's a need to
leverage the engine's optimization features more effectively. We need to
focus on data quality benchmarks, rather than optimization queries.

 - Configurability of Gateway: To enhance flexibility, the gateway between
Apache Griffin and the engine should be configurable. This ensures
compatibility with popular gateways such as Trino, Kyuubi, etc.

 - Lack of Default Alert Channels: Currently, there is a deficit in default
alert channels. Providing default channels such as Slack, WeChat, etc., is
essential to ensure timely communication of alerts.

 - Absence of Anomaly Detection Module: An anomaly detection module is
conspicuously absent. Presently, our thresholds are statically configured,
indicating a need for dynamic anomaly detection capabilities.

I encourage everyone to share their thoughts and insights on these points
within our development list. Your contributions will be invaluable as we
work towards enhancing the functionality and usability of Apache Griffin.


Thanks,
William

[Discuss] apache griffin curent issues

Reply via email to