Hi all,
I would suggest simplifying Griffin-DSL.
Currently, Griffin supports three types of DSL: spark-sql, griffin-dsl and
df-ops respectively. In this proposal, I only focus on the first two.
Griffin-DSL is a SQL-like language, supporst a wide range of clauses, key
words, operators etc as Spark SQL. class "GriffinDslParser" also defines
how to parse the SQL-like syntax. Actually, Griffin-DSL's SQL-like syntax
could be covered by Spark SQL completely. Spark 2.0 substantially improved
SQL functionalities with SQL2003 support and can now run all 99 TPC-DS
queries.
So is it possible for Griffin-DSL to remove all SQL-like language features?
All rules, which could be expressed by SQL, would be categorized as
"spark-sql" DSL type instead of "griffin-dsl". In this case, we could
simplify the implementation of Griffin-DSL.
For my understanding, Griffin-DSL should be the high-order expressions,
each of them represents a specific set of semantics. Griffin-DSL continues
focusing on the expressions with the richer semantics in data exploration
or wrangling area, and leaves all SQL compatible expressions to Spark SQL.
Griffin-DSL is still translated into Spark-SQL when being executed.
here is an example from the unit test "_accuracy-batch-griffindsl.json"
"evaluate.rule": {
"rules": [
{
"dsl.type": "griffin-dsl",
"dq.type": "accuracy",
"out.dataframe.name": "accu",
"rule": "source.user_id = target.user_id AND
upper(source.first_name) = upper(target.first_name) AND source.last_name =
target.last_name AND source.address = target.address AND source.email =
target.email AND source.phone = target.phone AND source.post_code =
target.post_code",
"details": {
"source": "source",
"target": "target",
"miss": "miss_count",
"total": "total_count",
"matched": "matched_count"
},
"out":[
{
"type": "record",
"name": "missRecords"
}
]
}
]
}
If we move SQL-like syntax out of Griffin-DSL, the preceding example will
take "dsl.type" as "spark-sql", and "rule" would be probably a list of
columns or all columns by default.
Discussions are welcomed.
Grant