Hi Lionel,

The JIRA ticket is https://issues.apache.org/jira/browse/GRIFFIN-289

Here we have made an inner version

We have upgraded griffin measure code, and there are three main upgrades
1, DQ job configure parameters.
In DQ job configure parameters, we add a "error.confs" key in "rules", it
looks like
[image: image.png]
[image: image.png]
"regex" means "regular expression" mode, and "enumeration" means
"enumeration" mode


2,DQConfig scala file. According to the updates of DQ job configure
parameters, update DQConfig to deserialize new parameters

3, CompletenessExpr2DQSteps scala file. To generate sql to count how many
incomplete rows.

Could you please consider our implementation?

Thanks

Zhao


On Wed, Sep 11, 2019 at 1:36 PM Lionel Liu <lionel...@apache.org> wrote:

> Hi Zhao,
>
> Your requirement makes sense, that would be a common usage of COMPLETENESS
> cases.
> You can submit a JIRA ticket for Griffin community with the description:
> https://issues.apache.org/jira/browse/griffin, and then someone would pick
> the ticket and do the implementation.
>
> Thanks,
> Lionel
>
> On Mon, Sep 9, 2019 at 6:56 PM 钊 <mrlzbe...@gmail.com> wrote:
>
> > Hello
> >
> > Now we use griffin measure module to check batch data quality. In
> > COMPLETENESS dq type, griffin checks how many incomplete records in
> table,
> > and griffin only check if one column is 'null' or not.
> >
> > However, only "null" is not enough to consider whether one column is
> > invalid or not. In our condition, analysts may consider other value is
> > invalid even though they are not "null". For example, one column named
> > "company", if company in ("a", "b", "c"), this record is invalid.
> >
> > Here we need two ways for user to filter incomplete record, one is
> > "enumeration", users write all invalid values they think for one column;
> > the other is "regular expression", users write regular expression to
> match
> > invalid values for one column.
> >
> > Could griffin updates COMPLETENESS dq type to support our "enumeration"
> and
> > "regular expression" way to filter incomplete records?
> >
> > Regards
> >
> > Zhao
> >
>

Reply via email to