Re: [Discuss] apache griffin curent issues

William Guo Tue, 27 Feb 2024 01:24:42 -0800

hi all,

I have drafted dq job layer model in the following wiki,
https://cwiki.apache.org/confluence/display/GRIFFIN/DQ+Job+models


Please have a review.

Thanks,
William


On Sat, Feb 24, 2024 at 7:29 AM William Guo <[email protected]> wrote:

> hi all,
>
> Please review our data quality models for DQJob
> https://cwiki.apache.org/confluence/display/GRIFFIN/models
>
> Thanks,
> William
>
> On Thu, Feb 22, 2024 at 11:10 AM William Guo <[email protected]> wrote:
>
>> hi all,
>>
>> I have updated the architecture in our wiki.
>>
>> https://cwiki.apache.org/confluence/display/GRIFFIN/The+DQ+workflow+Architecture+Proposal
>>
>> please have a check and reviews are welcome.
>>
>>
>>
>> Thanks,
>> William
>>
>>
>> On Wed, Feb 21, 2024 at 10:43 AM William Guo <[email protected]> wrote:
>>
>>> One risk is that our griffin 1.0,0 might not compatible with previous
>>> versions.
>>> But we will try to keep the metrics module compatible.
>>>
>>>
>>>
>>> On Tue, Feb 20, 2024 at 9:20 PM Z Mr <[email protected]> wrote:
>>>
>>>> Excellent, I have been researching related topics recently, especially
>>>> regarding data quality definitions and the selection of computing engines.
>>>> If we can implement the content mentioned above, it would be a significant
>>>> achievement.
>>>>
>>>> Additionally, a more flexible and straightforward installation and
>>>> deployment process is also very important for the widespread adoption and
>>>> use of Griffin.
>>>>
>>>> Thanks,
>>>> Zyhao
>>>> ________________________________
>>>> From: William Guo <[email protected]>
>>>> Sent: Monday, February 19, 2024 16:24
>>>> To: [email protected] <[email protected]>
>>>> Subject: [Discuss] apache griffin curent issues
>>>>
>>>> hi all,
>>>>
>>>> As we embark on the journey of refactoring Apache Griffin, I'd like to
>>>> draw
>>>> attention to some key areas for improvement. These points serve as a
>>>> foundation for discussion within our development community:
>>>>
>>>>  - Incomplete and Inflexible Data Quality Definition: The current
>>>> definition of data quality lacks completeness and flexibility. A
>>>> comprehensive data quality rule should encompass recording metrics,
>>>> anomaly
>>>> detection, and actionable steps.
>>>>
>>>>  - Rigid Triggering Mechanism: The triggering mechanism for measures
>>>> exhibits rigidity. Integration with the scheduler in enterprise
>>>> production
>>>> environments needs to be seamless and deeply integrated.
>>>>
>>>>  - Over Reliance on Internal Data Comparison: The measure implementation
>>>> overly depends on its own data comparison methods, neglecting the
>>>> optimization capabilities inherent in the engine. There's a need to
>>>> leverage the engine's optimization features more effectively. We need to
>>>> focus on data quality benchmarks, rather than optimization queries.
>>>>
>>>>  - Configurability of Gateway: To enhance flexibility, the gateway
>>>> between
>>>> Apache Griffin and the engine should be configurable. This ensures
>>>> compatibility with popular gateways such as Trino, Kyuubi, etc.
>>>>
>>>>  - Lack of Default Alert Channels: Currently, there is a deficit in
>>>> default
>>>> alert channels. Providing default channels such as Slack, WeChat, etc.,
>>>> is
>>>> essential to ensure timely communication of alerts.
>>>>
>>>>  - Absence of Anomaly Detection Module: An anomaly detection module is
>>>> conspicuously absent. Presently, our thresholds are statically
>>>> configured,
>>>> indicating a need for dynamic anomaly detection capabilities.
>>>>
>>>> I encourage everyone to share their thoughts and insights on these
>>>> points
>>>> within our development list. Your contributions will be invaluable as we
>>>> work towards enhancing the functionality and usability of Apache
>>>> Griffin.
>>>>
>>>>
>>>> Thanks,
>>>> William
>>>>
>>>

Re: [Discuss] apache griffin curent issues

Reply via email to