Hi Lionel, What do you think of my implementation?
Thanks Zhao On Wed, Sep 11, 2019 at 8:18 PM 钊 <[email protected]> wrote: > Hi Lionel, > > The JIRA ticket is https://issues.apache.org/jira/browse/GRIFFIN-289 > > Here we have made an inner version > > We have upgraded griffin measure code, and there are three main upgrades > 1, DQ job configure parameters. > In DQ job configure parameters, we add a "error.confs" key in "rules", it > looks like > [image: image.png] > [image: image.png] > "regex" means "regular expression" mode, and "enumeration" means > "enumeration" mode > > > 2,DQConfig scala file. According to the updates of DQ job configure > parameters, update DQConfig to deserialize new parameters > > 3, CompletenessExpr2DQSteps scala file. To generate sql to count how many > incomplete rows. > > Could you please consider our implementation? > > Thanks > > Zhao > > > On Wed, Sep 11, 2019 at 1:36 PM Lionel Liu <[email protected]> wrote: > >> Hi Zhao, >> >> Your requirement makes sense, that would be a common usage of COMPLETENESS >> cases. >> You can submit a JIRA ticket for Griffin community with the description: >> https://issues.apache.org/jira/browse/griffin, and then someone would >> pick >> the ticket and do the implementation. >> >> Thanks, >> Lionel >> >> On Mon, Sep 9, 2019 at 6:56 PM 钊 <[email protected]> wrote: >> >> > Hello >> > >> > Now we use griffin measure module to check batch data quality. In >> > COMPLETENESS dq type, griffin checks how many incomplete records in >> table, >> > and griffin only check if one column is 'null' or not. >> > >> > However, only "null" is not enough to consider whether one column is >> > invalid or not. In our condition, analysts may consider other value is >> > invalid even though they are not "null". For example, one column named >> > "company", if company in ("a", "b", "c"), this record is invalid. >> > >> > Here we need two ways for user to filter incomplete record, one is >> > "enumeration", users write all invalid values they think for one column; >> > the other is "regular expression", users write regular expression to >> match >> > invalid values for one column. >> > >> > Could griffin updates COMPLETENESS dq type to support our "enumeration" >> and >> > "regular expression" way to filter incomplete records? >> > >> > Regards >> > >> > Zhao >> > >> >
