hi all,

I have updated the architecture in our wiki.
https://cwiki.apache.org/confluence/display/GRIFFIN/The+DQ+workflow+Architecture+Proposal

please have a check and reviews are welcome.



Thanks,
William


On Wed, Feb 21, 2024 at 10:43 AM William Guo <gu...@apache.org> wrote:

> One risk is that our griffin 1.0,0 might not compatible with previous
> versions.
> But we will try to keep the metrics module compatible.
>
>
>
> On Tue, Feb 20, 2024 at 9:20 PM Z Mr <zyhao_co...@outlook.com> wrote:
>
>> Excellent, I have been researching related topics recently, especially
>> regarding data quality definitions and the selection of computing engines.
>> If we can implement the content mentioned above, it would be a significant
>> achievement.
>>
>> Additionally, a more flexible and straightforward installation and
>> deployment process is also very important for the widespread adoption and
>> use of Griffin.
>>
>> Thanks,
>> Zyhao
>> ________________________________
>> From: William Guo <gu...@apache.org>
>> Sent: Monday, February 19, 2024 16:24
>> To: dev@griffin.apache.org <dev@griffin.apache.org>
>> Subject: [Discuss] apache griffin curent issues
>>
>> hi all,
>>
>> As we embark on the journey of refactoring Apache Griffin, I'd like to
>> draw
>> attention to some key areas for improvement. These points serve as a
>> foundation for discussion within our development community:
>>
>>  - Incomplete and Inflexible Data Quality Definition: The current
>> definition of data quality lacks completeness and flexibility. A
>> comprehensive data quality rule should encompass recording metrics,
>> anomaly
>> detection, and actionable steps.
>>
>>  - Rigid Triggering Mechanism: The triggering mechanism for measures
>> exhibits rigidity. Integration with the scheduler in enterprise production
>> environments needs to be seamless and deeply integrated.
>>
>>  - Over Reliance on Internal Data Comparison: The measure implementation
>> overly depends on its own data comparison methods, neglecting the
>> optimization capabilities inherent in the engine. There's a need to
>> leverage the engine's optimization features more effectively. We need to
>> focus on data quality benchmarks, rather than optimization queries.
>>
>>  - Configurability of Gateway: To enhance flexibility, the gateway between
>> Apache Griffin and the engine should be configurable. This ensures
>> compatibility with popular gateways such as Trino, Kyuubi, etc.
>>
>>  - Lack of Default Alert Channels: Currently, there is a deficit in
>> default
>> alert channels. Providing default channels such as Slack, WeChat, etc., is
>> essential to ensure timely communication of alerts.
>>
>>  - Absence of Anomaly Detection Module: An anomaly detection module is
>> conspicuously absent. Presently, our thresholds are statically configured,
>> indicating a need for dynamic anomaly detection capabilities.
>>
>> I encourage everyone to share their thoughts and insights on these points
>> within our development list. Your contributions will be invaluable as we
>> work towards enhancing the functionality and usability of Apache Griffin.
>>
>>
>> Thanks,
>> William
>>
>

Reply via email to