[jira] [Created] (SPARK-19842) Informational Referential Integrity Constraints Support in Spark

Ioana Delaney (JIRA) Mon, 06 Mar 2017 16:27:55 -0800

Ioana Delaney created SPARK-19842:
-------------------------------------

             Summary: Informational Referential Integrity Constraints Support 
in Spark
                 Key: SPARK-19842
                 URL: https://issues.apache.org/jira/browse/SPARK-19842
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.2.0
            Reporter: Ioana Delaney



*Informational Referential Integrity Constraints Support in Spark*

This work proposes support for _informational primary key_ and _foreign key 
(referential integrity) constraints_ in Spark. The main purpose is to open up 
an area of query optimization techniques that rely on referential integrity 
constraints semantics. 

An _informational_ or _statistical constraint_ is a constraint such as a 
_unique_, _primary key_, _foreign key_, or _check constraint_, that can be used 
by Spark to improve query performance. Informational constraints are not 
enforced by the Spark SQL engine; rather, they are used by Catalyst to optimize 
the query processing. They provide semantics information that allows Catalyst 
to rewrite queries to eliminate joins, push down aggregates, remove unnecessary 
Distinct operations, and perform a number of other optimizations. Informational 
constraints are primarily targeted to applications that load and analyze data 
that originated from a data warehouse. For such applications, the conditions 
for a given constraint are known to be true, so the constraint does not need to 
be enforced during data load operations. 

The attached document covers constraint definition, metastore storage, 
constraint validation, and maintenance. The document shows many examples of 
query performance improvements that utilize referential integrity constraints 
and can be implemented in Spark.






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-19842) Informational Referential Integrity Constraints Support in Spark

Reply via email to