Zhao Li created GRIFFIN-289:
-------------------------------

             Summary: new feature for griffin COMPLETENESS dq type
                 Key: GRIFFIN-289
                 URL: https://issues.apache.org/jira/browse/GRIFFIN-289
             Project: Griffin
          Issue Type: New Feature
          Components: completeness-batch
    Affects Versions: 0.3.1-incubating
            Reporter: Zhao Li


Hello
 
Now we use griffin measure module to check batch data quality. In COMPLETENESS 
dq type, griffin checks how many incomplete records in table, and griffin only 
check if one column is 'null' or not.
 
However, only "null" is not enough to consider whether one column is invalid or 
not. In our condition, analysts may consider other value is invalid even though 
they are not "null". For example, one column named "company", if company in 
("a", "b", "c"), this record is invalid.
 
Here we need two ways for user to filter incomplete record, one is 
"enumeration", users write all invalid values they think for one column; the 
other is "regular expression", users write regular expression to match invalid 
values for one column.
 
Could griffin updates COMPLETENESS dq type to support our "enumeration" and 
"regular expression" way to filter incomplete records?
 
Regards
 
Zhao



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to