[
https://issues.apache.org/jira/browse/GRIFFIN-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565878#comment-16565878
]
ASF GitHub Bot commented on GRIFFIN-164:
----------------------------------------
GitHub user spencer-hivert-ck opened a pull request:
https://github.com/apache/incubator-griffin/pull/381
GRIFFIN-164 GRIFFIN-186 GRIFFIN-187: Profiling Re-factor + Regex/Empty
String Support
We've been working away on Griffin here at Credit Karma, and we'd love to
contribute back!
This PR tackles three separate tasks:
- [GRIFFIN-164](https://issues.apache.org/jira/browse/GRIFFIN-164): Regex
Support
- [GRIFFIN-186](https://issues.apache.org/jira/browse/GRIFFIN-186): Create
Profiling Measure Re-Factor
- [GRIFFIN-187](https://issues.apache.org/jira/browse/GRIFFIN-187): Empty
String Support
The details for each of these tasks can be found in the JIRA tickets linked
above!
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/spencer-hivert-ck/incubator-griffin
shivert/profiling-refactor-and-regex-support
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-griffin/pull/381.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #381
----
commit c4a344d7b771946ad41d9c89aa381b3925105464
Author: Spencer Hivert <spencer.hivert@...>
Date: 2018-08-01T19:32:18Z
GRIFFIN-164 GRIFFIN-186 GRIFFIN-187: Profiling Re-factor + Regex/Empty
String Support
----
> Make 'Regular expression detection count' available in UI
> ---------------------------------------------------------
>
> Key: GRIFFIN-164
> URL: https://issues.apache.org/jira/browse/GRIFFIN-164
> Project: Griffin (Incubating)
> Issue Type: Improvement
> Affects Versions: 0.1.6-incubating
> Reporter: Enrico D'Urso
> Priority: Minor
> Fix For: 1.0.0-incubating
>
>
> Hi,
> I have been playing for one month now with Griffin.
> Given my experience, some companies (included the one am working for as a
> consultant) prefer doing stuff using UI.
> Personally, I find very useful the following feature:
>
> * Regular expression detection count
> which is, I have a column which should contain just numbers so I want to
> check if my ETL process, wrongly, has populated my table with non-numeric
> values.
> I have been able to run such a job creating my self the right config.json, in
> particular, using spark-sql as dialect:
> {code:java}
> select count(*) from src where account_id rlike [^0-9]
> {code}
> I saw that in pr.component.ts there is a commented line of code:
> {code:java}
> // {"id":10,"itemName":"Regular Expression Detection Count","category":
> "Advanced Statistics"}
> {code}
> which I think is what I am talking about.
> Also, I can read:
> {code:java}
> // case 'Regular Expression Detection Count': // return
> 'count(source.`'+col.name+'`) where source.`'+col.name+'` LIKE ';
> {code}
> which should be the griffin-dsl dialect, even if, probably, the regex should
> be added just after LIKE.
> Then, once that the above griffin-dsl statement is available in the backend,
> ProfilingRulePlanTrans class
> should map that into 'rlike' Spark-sql clause.
> Am not sure where (and if) ProfilingRulePlanTrans should be modified as
> preGroupbyClause should contains everything, but I do not have enough
> knowledge about it.
>
> Please judge yourself the priority of such a feature, which knowing well the
> code, should not be too hard to make.
> Thanks,
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)