Re: [Discuss] apache griffin curent issues

William Guo Fri, 23 Feb 2024 15:30:19 -0800

hi all,

Please review our data quality models for DQJob
https://cwiki.apache.org/confluence/display/GRIFFIN/models


Thanks,
William

On Thu, Feb 22, 2024 at 11:10 AM William Guo <[email protected]> wrote:

> hi all,
>
> I have updated the architecture in our wiki.
>
> https://cwiki.apache.org/confluence/display/GRIFFIN/The+DQ+workflow+Architecture+Proposal
>
> please have a check and reviews are welcome.
>
>
>
> Thanks,
> William
>
>
> On Wed, Feb 21, 2024 at 10:43 AM William Guo <[email protected]> wrote:
>
>> One risk is that our griffin 1.0,0 might not compatible with previous
>> versions.
>> But we will try to keep the metrics module compatible.
>>
>>
>>
>> On Tue, Feb 20, 2024 at 9:20 PM Z Mr <[email protected]> wrote:
>>
>>> Excellent, I have been researching related topics recently, especially
>>> regarding data quality definitions and the selection of computing engines.
>>> If we can implement the content mentioned above, it would be a significant
>>> achievement.
>>>
>>> Additionally, a more flexible and straightforward installation and
>>> deployment process is also very important for the widespread adoption and
>>> use of Griffin.
>>>
>>> Thanks,
>>> Zyhao
>>> ________________________________
>>> From: William Guo <[email protected]>
>>> Sent: Monday, February 19, 2024 16:24
>>> To: [email protected] <[email protected]>
>>> Subject: [Discuss] apache griffin curent issues
>>>
>>> hi all,
>>>
>>> As we embark on the journey of refactoring Apache Griffin, I'd like to
>>> draw
>>> attention to some key areas for improvement. These points serve as a
>>> foundation for discussion within our development community:
>>>
>>>  - Incomplete and Inflexible Data Quality Definition: The current
>>> definition of data quality lacks completeness and flexibility. A
>>> comprehensive data quality rule should encompass recording metrics,
>>> anomaly
>>> detection, and actionable steps.
>>>
>>>  - Rigid Triggering Mechanism: The triggering mechanism for measures
>>> exhibits rigidity. Integration with the scheduler in enterprise
>>> production
>>> environments needs to be seamless and deeply integrated.
>>>
>>>  - Over Reliance on Internal Data Comparison: The measure implementation
>>> overly depends on its own data comparison methods, neglecting the
>>> optimization capabilities inherent in the engine. There's a need to
>>> leverage the engine's optimization features more effectively. We need to
>>> focus on data quality benchmarks, rather than optimization queries.
>>>
>>>  - Configurability of Gateway: To enhance flexibility, the gateway
>>> between
>>> Apache Griffin and the engine should be configurable. This ensures
>>> compatibility with popular gateways such as Trino, Kyuubi, etc.
>>>
>>>  - Lack of Default Alert Channels: Currently, there is a deficit in
>>> default
>>> alert channels. Providing default channels such as Slack, WeChat, etc.,
>>> is
>>> essential to ensure timely communication of alerts.
>>>
>>>  - Absence of Anomaly Detection Module: An anomaly detection module is
>>> conspicuously absent. Presently, our thresholds are statically
>>> configured,
>>> indicating a need for dynamic anomaly detection capabilities.
>>>
>>> I encourage everyone to share their thoughts and insights on these points
>>> within our development list. Your contributions will be invaluable as we
>>> work towards enhancing the functionality and usability of Apache Griffin.
>>>
>>>
>>> Thanks,
>>> William
>>>
>>

Re: [Discuss] apache griffin curent issues

Reply via email to